Re: [NTG-pdftex] [PATCH v4] Allow .enc files for bitmap PK fonts

15 Dec 2017

      On Friday 15 December 2017 20:47:30 Hans Hagen wrote:
...
On 12/15/2017 7:12 PM, Pali Rohár wrote:
...
On Friday 15 December 2017 17:13:22 Karl Berry wrote:
...
(Sorry for the delayed reply.)
Date: Sat, 19 Aug 2017 16:02:17 +0200
     From: Pali Rohár 
     Subject: [PATCH v4] Allow .enc files for bitmap PK fonts
Thanks for splitting the patch into those separate pieces, Pali, and
doing the test and documentation updates. Very helpful. Reading through
the changes, they generally look fine.
My only question at the moment is, why do duplicate glyph names have to
be removed in advance (in patch 3)? Otherwise we'll try to put two
glyphs by the same (PostScript/PDF) name in the output font? Or
something else? --thanks, karl.
Hi! Glyph names are put into /Differences PDF table and also glyphs
itself are identified in PDF by its names. So we cannot have two
different glyphs in PDF file with same name.
Where does the pdf standard mention that limitation? Why should glyph names
be unique? If there is some nencoding issue it more looks like there is a
shared Differences related dictionary / array that should not be shared
In /Differences table you assign character code for each glyph name.
Then in /CharProcs (for Type 3 font) you assign glyph definition for
each glyph name.

/CharProcs is of type PDF dictionary (page 421 in PDF Reference version
1.7). And it is undefined what happen if PDF dictionary contains one key
two times (page 59).

Basically glyph is identified by its name, not by character code, so two
different character codes needs to have two different glyph names (if
those characters code renders differently).
...
...
Function remove_duplicate_glyph_names() just remove duplicate glyph
names from enc file and later function writet3() for glyph index uses
either glyph name or if is not available (e.g. because of duplicates),
then it use name "a" (like before). This ensures that every
glyph has a unique name in PDF file.
If you comment that remove_duplicate_glyph_names() then you would see
what happen. pdftex would not be able to create PDF file with two
different glyphs with same name and would store just one glyph. That
would result in damaged PDF font, one glyph would be used for all
characters which had associated that one glyph name in enc file.
Probably it would be the glyph with highest index.
...
Test case for reproducing should be easy:
File my.enc:
============
/my [
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef
/mychar /mychar
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
] def
============
Ok, but that is not related to pdf (as format)
It is related to PDF format, see above. And more details are in PDF
Specification itself. In version 1.7 it is in section "5.5 Simple Fonts"
starting at page 412.
...
but to a bad vector and/or
pdftex not taking the right one ... is messing around with names (thereby
obscuring the problem) better than fixing the enc file? After all, now one
of the glyphs will still have the wrong name.
Basically there are two different things:

1) Glyph names

2) CMap encoding table

In CMap table is mapping from the character code to Unicode (codepoint)
sequence. And PDF viewers should use this mapping table to assign
Unicode codepoint for particular glyph which render.

But reality is that there are "not so good" PDF viewers which ignores
CMap table stored in PDF file and do some mapping from glyph name to
Unicode codepoint.

It looks like that currently pdftex generates CMap from glyph names.
Theoretically it should be possible to assign fully unique glyph names
for every one glyph, possible fully random and then into CMap table put
correct mapping for all character codes (as CMap table does not use
glyph names) according to enc file.

Correct PDF viewers which use CMap table will load character ==> Unicode
mapping from CMap table. "not so good" PDF viewers stay broken.
...
...
File test.tex:
============
\pdfglyphtounicode{mychar}{269}
\pdfgentounicode=1
\pdfmapline{cmb10 
And result PDF file would not render glyph 'a' if function
remove_duplicate_glyph_names() is disabled. There would be two glyphs 'b'.
-- 
Pali Rohár
pali.rohar@gmail.com