On Friday 15 December 2017 20:47:30 Hans Hagen wrote:
On 12/15/2017 7:12 PM, Pali Rohár wrote:
On Friday 15 December 2017 17:13:22 Karl Berry wrote:
(Sorry for the delayed reply.)
Date: Sat, 19 Aug 2017 16:02:17 +0200 From: Pali Rohár
Subject: [PATCH v4] Allow .enc files for bitmap PK fonts Thanks for splitting the patch into those separate pieces, Pali, and doing the test and documentation updates. Very helpful. Reading through the changes, they generally look fine.
My only question at the moment is, why do duplicate glyph names have to be removed in advance (in patch 3)? Otherwise we'll try to put two glyphs by the same (PostScript/PDF) name in the output font? Or something else? --thanks, karl.
Hi! Glyph names are put into /Differences PDF table and also glyphs itself are identified in PDF by its names. So we cannot have two different glyphs in PDF file with same name.
Where does the pdf standard mention that limitation? Why should glyph names be unique? If there is some nencoding issue it more looks like there is a shared Differences related dictionary / array that should not be shared
In /Differences table you assign character code for each glyph name. Then in /CharProcs (for Type 3 font) you assign glyph definition for each glyph name. /CharProcs is of type PDF dictionary (page 421 in PDF Reference version 1.7). And it is undefined what happen if PDF dictionary contains one key two times (page 59). Basically glyph is identified by its name, not by character code, so two different character codes needs to have two different glyph names (if those characters code renders differently).
Function remove_duplicate_glyph_names() just remove duplicate glyph names from enc file and later function writet3() for glyph index uses either glyph name or if is not available (e.g. because of duplicates), then it use name "a
" (like before). This ensures that every glyph has a unique name in PDF file. If you comment that remove_duplicate_glyph_names() then you would see what happen. pdftex would not be able to create PDF file with two different glyphs with same name and would store just one glyph. That would result in damaged PDF font, one glyph would be used for all characters which had associated that one glyph name in enc file. Probably it would be the glyph with highest index.
Test case for reproducing should be easy:
File my.enc: ============ /my [ /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /mychar /mychar /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef ] def ============
Ok, but that is not related to pdf (as format)
It is related to PDF format, see above. And more details are in PDF Specification itself. In version 1.7 it is in section "5.5 Simple Fonts" starting at page 412.
but to a bad vector and/or pdftex not taking the right one ... is messing around with names (thereby obscuring the problem) better than fixing the enc file? After all, now one of the glyphs will still have the wrong name.
Basically there are two different things: 1) Glyph names 2) CMap encoding table In CMap table is mapping from the character code to Unicode (codepoint) sequence. And PDF viewers should use this mapping table to assign Unicode codepoint for particular glyph which render. But reality is that there are "not so good" PDF viewers which ignores CMap table stored in PDF file and do some mapping from glyph name to Unicode codepoint. It looks like that currently pdftex generates CMap from glyph names. Theoretically it should be possible to assign fully unique glyph names for every one glyph, possible fully random and then into CMap table put correct mapping for all character codes (as CMap table does not use glyph names) according to enc file. Correct PDF viewers which use CMap table will load character ==> Unicode mapping from CMap table. "not so good" PDF viewers stay broken.
File test.tex: ============ \pdfglyphtounicode{mychar}{269} \pdfgentounicode=1 \pdfmapline{cmb10
And result PDF file would not render glyph 'a' if function remove_duplicate_glyph_names() is disabled. There would be two glyphs 'b'.
-- Pali Rohár pali.rohar@gmail.com