[NTG-pdftex] [PATCH v4] Allow .enc files for bitmap PK fonts

Pali Rohár pali.rohar at gmail.com
Fri Dec 15 21:27:03 CET 2017


On Friday 15 December 2017 20:47:30 Hans Hagen wrote:
> On 12/15/2017 7:12 PM, Pali Rohár wrote:
> > On Friday 15 December 2017 17:13:22 Karl Berry wrote:
> > > (Sorry for the delayed reply.)
> > > 
> > >      Date: Sat, 19 Aug 2017 16:02:17 +0200
> > >      From: Pali Rohár <pali.rohar at gmail.com>
> > >      Subject: [PATCH v4] Allow .enc files for bitmap PK fonts
> > > 
> > > Thanks for splitting the patch into those separate pieces, Pali, and
> > > doing the test and documentation updates. Very helpful. Reading through
> > > the changes, they generally look fine.
> > > 
> > > My only question at the moment is, why do duplicate glyph names have to
> > > be removed in advance (in patch 3)? Otherwise we'll try to put two
> > > glyphs by the same (PostScript/PDF) name in the output font? Or
> > > something else? --thanks, karl.
> > 
> > Hi! Glyph names are put into /Differences PDF table and also glyphs
> > itself are identified in PDF by its names. So we cannot have two
> > different glyphs in PDF file with same name.
> 
> Where does the pdf standard mention that limitation? Why should glyph names
> be unique? If there is some nencoding issue it more looks like there is a
> shared Differences related dictionary / array that should not be shared

In /Differences table you assign character code for each glyph name.
Then in /CharProcs (for Type 3 font) you assign glyph definition for
each glyph name.

/CharProcs is of type PDF dictionary (page 421 in PDF Reference version
1.7). And it is undefined what happen if PDF dictionary contains one key
two times (page 59).

Basically glyph is identified by its name, not by character code, so two
different character codes needs to have two different glyph names (if
those characters code renders differently).

> > Function remove_duplicate_glyph_names() just remove duplicate glyph
> > names from enc file and later function writet3() for glyph index uses
> > either glyph name or if is not available (e.g. because of duplicates),
> > then it use name "a<glyph_index>" (like before). This ensures that every
> > glyph has a unique name in PDF file.
> > 
> > If you comment that remove_duplicate_glyph_names() then you would see
> > what happen. pdftex would not be able to create PDF file with two
> > different glyphs with same name and would store just one glyph. That
> > would result in damaged PDF font, one glyph would be used for all
> > characters which had associated that one glyph name in enc file.
> > Probably it would be the glyph with highest index.
> 
> > Test case for reproducing should be easy:
> > 
> > File my.enc:
> > ============
> > /my [
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef
> > /mychar /mychar
> > /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
> > ] def
> > ============
> 
> Ok, but that is not related to pdf (as format)

It is related to PDF format, see above. And more details are in PDF
Specification itself. In version 1.7 it is in section "5.5 Simple Fonts"
starting at page 412.

> but to a bad vector and/or
> pdftex not taking the right one ... is messing around with names (thereby
> obscuring the problem) better than fixing the enc file? After all, now one
> of the glyphs will still have the wrong name.

Basically there are two different things:

1) Glyph names

2) CMap encoding table

In CMap table is mapping from the character code to Unicode (codepoint)
sequence. And PDF viewers should use this mapping table to assign
Unicode codepoint for particular glyph which render.

But reality is that there are "not so good" PDF viewers which ignores
CMap table stored in PDF file and do some mapping from glyph name to
Unicode codepoint.

It looks like that currently pdftex generates CMap from glyph names.
Theoretically it should be possible to assign fully unique glyph names
for every one glyph, possible fully random and then into CMap table put
correct mapping for all character codes (as CMap table does not use
glyph names) according to enc file.

Correct PDF viewers which use CMap table will load character ==> Unicode
mapping from CMap table. "not so good" PDF viewers stay broken.

> > File test.tex:
> > ============
> > \pdfglyphtounicode{mychar}{269}
> > \pdfgentounicode=1
> > \pdfmapline{cmb10 <my.enc}
> > \font\cmb=cmb10
> > \cmb
> > a b
> > \bye
> > ============
> > 
> > And result PDF file would not render glyph 'a' if function
> > remove_duplicate_glyph_names() is disabled. There would be two glyphs 'b'.
> > 
> 
> 

-- 
Pali Rohár
pali.rohar at gmail.com


More information about the ntg-pdftex mailing list