Re: [NTG-pdftex] [PATCH v4] Allow .enc files for bitmap PK fonts

18 Dec 2017

      On Monday 18 December 2017 11:17:45 Hans Hagen wrote:
...
...
It looks like that currently pdftex generates CMap from glyph names.
Theoretically it should be possible to assign fully unique glyph names
for every one glyph, possible fully random and then into CMap table put
correct mapping for all character codes (as CMap table does not use
glyph names) according to enc file.
that would confuse some viewers too (i remember some thread about non
standard ffi ligature names and resolving hard coded in some viewer and the
request for tex related fonts to conform to that bad practice too)
First occurrence of duplicate can use originally specified glyph name
and second, third, ... occurrences can use newly unique glyph name (with
proper CMap table). Yes, that would not fix problem for those "some"
viewers but in this situation it is better then nothing.
...
...
...
...
File test.tex:
============
\pdfglyphtounicode{mychar}{269}
\pdfgentounicode=1
\pdfmapline{cmb10 
And result PDF file would not render glyph 'a' if function
remove_duplicate_glyph_names() is disabled. There would be two glyphs 'b'.
but still i think that the fact that there are duplicate names in my.enc
file is the real problem: if two b's refer to different shapes then what is
the real 'b'? And what is the right new name: b.one, b.two ?
If you have two shapes for b, then you can assign glyph name 'b' only
just for one shape in final PDF. What you can do is to create CMap table
where both characters would be mapped to unicode code point for 'b'.

PDF viewers which do not use CMap would not be able to copy+paste
properly. But this is current situation as /ToUnicode is not supported
for Type3 fonts yet.

Anyway, exactly same problem is for Type 1 fonts. If you have two
different shapes for b in Type 1 font, then only one can have glyph name
'b'.
...
What does one expect with cut and paste?
The expected behavior for ordinary user is simple: Both glyphs which are
marked as 'b' should be copied as character 'b'.

It can work only in PDF viewers with correct CMap support. But with
current pdftex code it is not possible.

But you are right that this is a real problem. Some calligraphic fonts
have more glyphs for one character. And decision which glyph needs to be
used is based on previous or next characters.
...
If two names are the same and they refer to the
same font program then there is no problem and the first one encountered
when embedding should be used.
If remove duplicates is an option in pdftex then at least make sure that
it's off by default (better complain loudly on the console that the enc is
broken)
Do you want to be this problem a fatal error?
...
so that the user knows that enabling that option is not solving the
problem (and in tex distributions the fixed enc should be used). Heuristics
and fixes for bugged fonts are nice but not being able
to bypass them is bad.
I thought it would be better to produce PDF file as enc file itself does
not change how PDF file is rendered. It affects only copy+paste from PDF
file.
...
(multiple .notdef is an exception)
Different, but maybe more interesting question is: What happens for
other font formats if supplied enc file contains duplicate names?

-- 
Pali Rohár
pali.rohar@gmail.com