On Monday 18 December 2017 11:17:45 Hans Hagen wrote:
It looks like that currently pdftex generates CMap from glyph names. Theoretically it should be possible to assign fully unique glyph names for every one glyph, possible fully random and then into CMap table put correct mapping for all character codes (as CMap table does not use glyph names) according to enc file.
that would confuse some viewers too (i remember some thread about non standard ffi ligature names and resolving hard coded in some viewer and the request for tex related fonts to conform to that bad practice too)
First occurrence of duplicate can use originally specified glyph name and second, third, ... occurrences can use newly unique glyph name (with proper CMap table). Yes, that would not fix problem for those "some" viewers but in this situation it is better then nothing.
File test.tex: ============ \pdfglyphtounicode{mychar}{269} \pdfgentounicode=1 \pdfmapline{cmb10
And result PDF file would not render glyph 'a' if function remove_duplicate_glyph_names() is disabled. There would be two glyphs 'b'. but still i think that the fact that there are duplicate names in my.enc file is the real problem: if two b's refer to different shapes then what is the real 'b'? And what is the right new name: b.one, b.two ?
If you have two shapes for b, then you can assign glyph name 'b' only just for one shape in final PDF. What you can do is to create CMap table where both characters would be mapped to unicode code point for 'b'. PDF viewers which do not use CMap would not be able to copy+paste properly. But this is current situation as /ToUnicode is not supported for Type3 fonts yet. Anyway, exactly same problem is for Type 1 fonts. If you have two different shapes for b in Type 1 font, then only one can have glyph name 'b'.
What does one expect with cut and paste?
The expected behavior for ordinary user is simple: Both glyphs which are marked as 'b' should be copied as character 'b'. It can work only in PDF viewers with correct CMap support. But with current pdftex code it is not possible. But you are right that this is a real problem. Some calligraphic fonts have more glyphs for one character. And decision which glyph needs to be used is based on previous or next characters.
If two names are the same and they refer to the same font program then there is no problem and the first one encountered when embedding should be used.
If remove duplicates is an option in pdftex then at least make sure that it's off by default (better complain loudly on the console that the enc is broken)
Do you want to be this problem a fatal error?
so that the user knows that enabling that option is not solving the problem (and in tex distributions the fixed enc should be used). Heuristics and fixes for bugged fonts are nice but not being able to bypass them is bad.
I thought it would be better to produce PDF file as enc file itself does not change how PDF file is rendered. It affects only copy+paste from PDF file.
(multiple .notdef is an exception)
Different, but maybe more interesting question is: What happens for other font formats if supplied enc file contains duplicate names? -- Pali Rohár pali.rohar@gmail.com