On 12/15/2017 9:27 PM, Pali Rohár wrote:
1) Glyph names
2) CMap encoding table
In CMap table is mapping from the character code to Unicode (codepoint) sequence. And PDF viewers should use this mapping table to assign Unicode codepoint for particular glyph which render.
But reality is that there are "not so good" PDF viewers which ignores CMap table stored in PDF file and do some mapping from glyph name to Unicode codepoint.
As type 1 can be mapped onto a wide font the glyph name is probably less an issue there so there most of the encoding data can be omitted. In cff 2 even less is needed. For copy paste the tounicode is needed and when absent glyph names play an (unreliable) role. My experience is that acrobat normally does things right (but has some weird limitations in the renderer), mupdf based viewers render perfect and do a reasonable cut and paste and that xpdf and friends are unreliable with cut and paste and have rendering issues too. So, when you create extra glyph names for type 3 they need to (somehow) obey the adobe logic (alpha.foo alongside alpha) as appending some number or character will spoil the cut and paste (depending on the viewer).
It looks like that currently pdftex generates CMap from glyph names. Theoretically it should be possible to assign fully unique glyph names for every one glyph, possible fully random and then into CMap table put correct mapping for all character codes (as CMap table does not use glyph names) according to enc file.
that would confuse some viewers too (i remember some thread about non standard ffi ligature names and resolving hard coded in some viewer and the request for tex related fonts to conform to that bad practice too)
Correct PDF viewers which use CMap table will load character ==> Unicode mapping from CMap table. "not so good" PDF viewers stay broken.
indeed, or worse: behave inconsistent over releases (which makes it hard to predict)
File test.tex: ============ \pdfglyphtounicode{mychar}{269} \pdfgentounicode=1 \pdfmapline{cmb10
And result PDF file would not render glyph 'a' if function remove_duplicate_glyph_names() is disabled. There would be two glyphs 'b'. but still i think that the fact that there are duplicate names in my.enc file is the real problem: if two b's refer to different shapes then what is the real 'b'? And what is the right new name: b.one, b.two ? What does one expect with cut and paste? If two names are the same and they refer to the same font program then there is no problem and the first one encountered when embedding should be used.
If remove duplicates is an option in pdftex then at least make sure that it's off by default (better complain loudly on the console that the enc is broken) so that the user knows that enabling that option is not solving the problem (and in tex distributions the fixed enc should be used). Heuristics and fixes for bugged fonts are nice but not being able to bypass them is bad. (multiple .notdef is an exception) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------