[NTG-pdftex] [PATCH v4] Allow .enc files for bitmap PK fonts

Hans Hagen pragma at wxs.nl
Mon Dec 18 13:11:20 CET 2017


On 12/18/2017 12:40 PM, Pali Rohár wrote:
> On Monday 18 December 2017 11:17:45 Hans Hagen wrote:
>>> It looks like that currently pdftex generates CMap from glyph names.
>>> Theoretically it should be possible to assign fully unique glyph names
>>> for every one glyph, possible fully random and then into CMap table put
>>> correct mapping for all character codes (as CMap table does not use
>>> glyph names) according to enc file.
>>
>> that would confuse some viewers too (i remember some thread about non
>> standard ffi ligature names and resolving hard coded in some viewer and the
>> request for tex related fonts to conform to that bad practice too)
> 
> First occurrence of duplicate can use originally specified glyph name
> and second, third, ... occurrences can use newly unique glyph name (with
> proper CMap table). Yes, that would not fix problem for those "some"
> viewers but in this situation it is better then nothing.

Two 'same' names in an enc file not referring to the same glyph is a 
bugged enc file. Personally I would not use such a font.

>>>>> File test.tex:
>>>>> ============
>>>>> \pdfglyphtounicode{mychar}{269}
>>>>> \pdfgentounicode=1
>>>>> \pdfmapline{cmb10 <my.enc}
>>>>> \font\cmb=cmb10
>>>>> \cmb
>>>>> a b
>>>>> \bye
>>>>> ============
>>>>>
>>>>> And result PDF file would not render glyph 'a' if function
>>>>> remove_duplicate_glyph_names() is disabled. There would be two glyphs 'b'.
>> but still i think that the fact that there are duplicate names in my.enc
>> file is the real problem: if two b's refer to different shapes then what is
>> the real 'b'? And what is the right new name: b.one, b.two ?
> 
> If you have two shapes for b, then you can assign glyph name 'b' only
> just for one shape in final PDF. What you can do is to create CMap table
> where both characters would be mapped to unicode code point for 'b'.

in that case the enc file should have dollar and dollar.oldstyle or b 
and b.smallcaps i.e. a proper name, not something arbitrary

> PDF viewers which do not use CMap would not be able to copy+paste
> properly. But this is current situation as /ToUnicode is not supported
> for Type3 fonts yet.

if one follows the adobe glyph name convention it should work ok (at 
least in acrobat, mupdf)

> Anyway, exactly same problem is for Type 1 fonts. If you have two
> different shapes for b in Type 1 font, then only one can have glyph name
> 'b'.

i've never seen a type 1 font with two 'same names' for different shapes 
... it would qualify as 'a font to avoid'

>> What does one expect with cut and paste?
> 
> The expected behavior for ordinary user is simple: Both glyphs which are
> marked as 'b' should be copied as character 'b'.
> 
> It can work only in PDF viewers with correct CMap support. But with
> current pdftex code it is not possible.

viewers can yuse the names instead

> But you are right that this is a real problem. Some calligraphic fonts
> have more glyphs for one character. And decision which glyph needs to be
> used is based on previous or next characters.

then there's something a.varianta, a.variantb, a.variantc and a cut and 
paste will use the 'a' part to identity the name, just like f_f_i is a 
convention for a ligature

>> If two names are the same and they refer to the
>> same font program then there is no problem and the first one encountered
>> when embedding should be used.
>>
>> If remove duplicates is an option in pdftex then at least make sure that
>> it's off by default (better complain loudly on the console that the enc is
>> broken)
> 
> Do you want to be this problem a fatal error?

Fatal in the sense that a viewer crashes? Sure. Then at least I know 
that the  'b' in a font is probably not a 'b'. Also, in that case it's a 
signal to avoid that font. (The same can be true for embedding fonts 
with bad font names that clash.)

FYI: I decided (in context with luatex at least) to *not* use the 
fontloader but write one on lua that stays close to the original font 
and avoids the usual heuristics ... it's hard to fight (bad or fuzzy) 
heuristics as they obscure problems.

>> so that the user knows that enabling that option is not solving the
>> problem (and in tex distributions the fixed enc should be used). Heuristics
>> and fixes for bugged fonts are nice but not being able
>> to bypass them is bad.
> 
> I thought it would be better to produce PDF file as enc file itself does
> not change how PDF file is rendered. It affects only copy+paste from PDF
> file.

But why not fix the enc file?

>> (multiple .notdef is an exception)
> 
> Different, but maybe more interesting question is: What happens for
> other font formats if supplied enc file contains duplicate names?
I can only speak for luatex: we don't use enc files for type 1 and 
opentype. And even for type 3 (which i never use) I'd avoid them. In 
fact, everything related to encodings is already dealt with when the 
font is defined (loaded), and an afm or pfb file is normally ok. Makes 
me wonder how these bad enc files can show up at all, as those type 3 
fonts are very old school and therefore the problem of duplicate names 
for different shaped should also have been seen with dvips and so.

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------


More information about the ntg-pdftex mailing list