[NTG-pdftex] [PATCH v4] Allow .enc files for bitmap PK fonts

Hans Hagen pragma at wxs.nl
Mon Dec 18 11:17:45 CET 2017


On 12/15/2017 9:27 PM, Pali Rohár wrote:

> 1) Glyph names
> 
> 2) CMap encoding table
> 
> In CMap table is mapping from the character code to Unicode (codepoint)
> sequence. And PDF viewers should use this mapping table to assign
> Unicode codepoint for particular glyph which render.
> 
> But reality is that there are "not so good" PDF viewers which ignores
> CMap table stored in PDF file and do some mapping from glyph name to
> Unicode codepoint.

As type 1 can be mapped onto a wide font the glyph name is probably less 
an issue there so there most of the encoding data can be omitted. In cff 
2 even less is needed.

For copy paste the tounicode is needed and when absent glyph names play 
an (unreliable) role. My experience is that acrobat normally does things 
right (but has some weird limitations in the renderer), mupdf based 
viewers render perfect and do a reasonable cut and paste and that xpdf 
and friends are unreliable with cut and paste and have rendering issues 
too. So, when you create extra glyph names for type 3 they need to 
(somehow) obey the adobe logic (alpha.foo alongside alpha) as appending 
some number or character will spoil the cut and paste (depending on the 
viewer).

> It looks like that currently pdftex generates CMap from glyph names.
> Theoretically it should be possible to assign fully unique glyph names
> for every one glyph, possible fully random and then into CMap table put
> correct mapping for all character codes (as CMap table does not use
> glyph names) according to enc file.

that would confuse some viewers too (i remember some thread about non 
standard ffi ligature names and resolving hard coded in some viewer and 
the request for tex related fonts to conform to that bad practice too)

> Correct PDF viewers which use CMap table will load character ==> Unicode
> mapping from CMap table. "not so good" PDF viewers stay broken.

indeed, or worse: behave inconsistent over releases (which makes it hard 
to predict)

>>> File test.tex:
>>> ============
>>> \pdfglyphtounicode{mychar}{269}
>>> \pdfgentounicode=1
>>> \pdfmapline{cmb10 <my.enc}
>>> \font\cmb=cmb10
>>> \cmb
>>> a b
>>> \bye
>>> ============
>>>
>>> And result PDF file would not render glyph 'a' if function
>>> remove_duplicate_glyph_names() is disabled. There would be two glyphs 'b'.
but still i think that the fact that there are duplicate names in my.enc 
file is the real problem: if two b's refer to different shapes then what 
is the real 'b'? And what is the right new name: b.one, b.two ? What 
does one expect with cut and paste? If two names are the same and they 
refer to the same font program then there is no problem and the first 
one encountered when embedding should be used.

If remove duplicates is an option in pdftex then at least make sure that 
it's off by default (better complain loudly on the console that the enc 
is broken) so that the user knows that enabling that option is not 
solving the problem (and in tex distributions the fixed enc should be 
used). Heuristics and fixes for bugged fonts are nice but not being able
to bypass them is bad.

(multiple .notdef is an exception)

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------


More information about the ntg-pdftex mailing list