[PATCH v4] Allow .enc files for bitmap PK fonts

newer
non-pdf special ignored: <special>

Pali Rohár

19 Aug 2017 19 Aug '17

4:02 p.m.

Hi! I'm sending new series of patches for PK fonts which are reworked on top of current texlive trunk svn repository. I split changes into more patch files and updated also documentation with test. Karl, can you look at them? -- Pali Rohár pali.rohar@gmail.com

Attachments:

0001-Cleanup-code-for-detection-of-bitmap-PK-font.patch (text/x-patch — 3.4 KB)
0002-Treat-all-bitmap-PK-fonts-as-non-scalable-in-all-cas.patch (text/x-patch — 3.8 KB)
0003-Allow-.enc-files-for-bitmap-PK-fonts.patch (text/x-patch — 5.9 KB)
0004-Add-checks-for-PK-font-files-defined-in-map-file.patch (text/x-patch — 1.4 KB)
0001-Update-documentation-tests-about-.enc-files-for-bitm.patch (text/x-patch — 3.3 KB)
signature.asc (application/pgp-signature — 198 bytes)

Show replies by date

Karl Berry

19 Aug 19 Aug

11:30 p.m.

Karl, can you look at them? I will, but it will be some time. Thanks. -k

Pali Rohár

17 Sep 17 Sep

10:16 a.m.

On Saturday 19 August 2017 23:30:12 Karl Berry wrote:

...

Karl, can you look at them?

I will, but it will be some time. Thanks. -k

Hi! When you have a time, then please look at it. I would like to see it in next pdftex release, so it would be great to look at it earlier as before release... in case there are problems I would have a time to fix them. Thanks! -- Pali Rohár pali.rohar@gmail.com

Karl Berry

15 Dec 15 Dec

6:13 p.m.

(Sorry for the delayed reply.) Date: Sat, 19 Aug 2017 16:02:17 +0200 From: Pali Rohár Subject: [PATCH v4] Allow .enc files for bitmap PK fonts Thanks for splitting the patch into those separate pieces, Pali, and doing the test and documentation updates. Very helpful. Reading through the changes, they generally look fine. My only question at the moment is, why do duplicate glyph names have to be removed in advance (in patch 3)? Otherwise we'll try to put two glyphs by the same (PostScript/PDF) name in the output font? Or something else? --thanks, karl.

Pali Rohár

7:12 p.m.

On Friday 15 December 2017 17:13:22 Karl Berry wrote:

...

(Sorry for the delayed reply.)

Date: Sat, 19 Aug 2017 16:02:17 +0200 From: Pali Rohár Subject: [PATCH v4] Allow .enc files for bitmap PK fonts

Thanks for splitting the patch into those separate pieces, Pali, and doing the test and documentation updates. Very helpful. Reading through the changes, they generally look fine.

My only question at the moment is, why do duplicate glyph names have to be removed in advance (in patch 3)? Otherwise we'll try to put two glyphs by the same (PostScript/PDF) name in the output font? Or something else? --thanks, karl.

Hi! Glyph names are put into /Differences PDF table and also glyphs itself are identified in PDF by its names. So we cannot have two different glyphs in PDF file with same name. Function remove_duplicate_glyph_names() just remove duplicate glyph names from enc file and later function writet3() for glyph index uses either glyph name or if is not available (e.g. because of duplicates), then it use name "a" (like before). This ensures that every glyph has a unique name in PDF file. If you comment that remove_duplicate_glyph_names() then you would see what happen. pdftex would not be able to create PDF file with two different glyphs with same name and would store just one glyph. That would result in damaged PDF font, one glyph would be used for all characters which had associated that one glyph name in enc file. Probably it would be the glyph with highest index. Test case for reproducing should be easy: File my.enc: ============ /my [ /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /mychar /mychar /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef ] def ============ File test.tex: ============ \pdfglyphtounicode{mychar}{269} \pdfgentounicode=1 \pdfmapline{cmb10

Hans Hagen

8:47 p.m.

On 12/15/2017 7:12 PM, Pali Rohár wrote:

...

On Friday 15 December 2017 17:13:22 Karl Berry wrote:

...
(Sorry for the delayed reply.)

Date: Sat, 19 Aug 2017 16:02:17 +0200 From: Pali Rohár Subject: [PATCH v4] Allow .enc files for bitmap PK fonts

Thanks for splitting the patch into those separate pieces, Pali, and doing the test and documentation updates. Very helpful. Reading through the changes, they generally look fine.

My only question at the moment is, why do duplicate glyph names have to be removed in advance (in patch 3)? Otherwise we'll try to put two glyphs by the same (PostScript/PDF) name in the output font? Or something else? --thanks, karl.

Hi! Glyph names are put into /Differences PDF table and also glyphs itself are identified in PDF by its names. So we cannot have two different glyphs in PDF file with same name.

Where does the pdf standard mention that limitation? Why should glyph names be unique? If there is some nencoding issue it more looks like there is a shared Differences related dictionary / array that should not be shared

...

Function remove_duplicate_glyph_names() just remove duplicate glyph names from enc file and later function writet3() for glyph index uses either glyph name or if is not available (e.g. because of duplicates), then it use name "a" (like before). This ensures that every glyph has a unique name in PDF file.

If you comment that remove_duplicate_glyph_names() then you would see what happen. pdftex would not be able to create PDF file with two different glyphs with same name and would store just one glyph. That would result in damaged PDF font, one glyph would be used for all characters which had associated that one glyph name in enc file. Probably it would be the glyph with highest index.

...

Test case for reproducing should be easy:

File my.enc: ============ /my [ /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /mychar /mychar /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef ] def ============

Ok, but that is not related to pdf (as format) but to a bad vector and/or pdftex not taking the right one ... is messing around with names (thereby obscuring the problem) better than fixing the enc file? After all, now one of the glyphs will still have the wrong name.

...

File test.tex: ============ \pdfglyphtounicode{mychar}{269} \pdfgentounicode=1 \pdfmapline{cmb10
And result PDF file would not render glyph 'a' if function remove_duplicate_glyph_names() is disabled. There would be two glyphs 'b'.

-- ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------

Pali Rohár

9:27 p.m.

On Friday 15 December 2017 20:47:30 Hans Hagen wrote:

...

On 12/15/2017 7:12 PM, Pali Rohár wrote:

...
On Friday 15 December 2017 17:13:22 Karl Berry wrote:

...
(Sorry for the delayed reply.)

Date: Sat, 19 Aug 2017 16:02:17 +0200 From: Pali Rohár Subject: [PATCH v4] Allow .enc files for bitmap PK fonts

Thanks for splitting the patch into those separate pieces, Pali, and doing the test and documentation updates. Very helpful. Reading through the changes, they generally look fine.

My only question at the moment is, why do duplicate glyph names have to be removed in advance (in patch 3)? Otherwise we'll try to put two glyphs by the same (PostScript/PDF) name in the output font? Or something else? --thanks, karl.

Hi! Glyph names are put into /Differences PDF table and also glyphs itself are identified in PDF by its names. So we cannot have two different glyphs in PDF file with same name.

Where does the pdf standard mention that limitation? Why should glyph names be unique? If there is some nencoding issue it more looks like there is a shared Differences related dictionary / array that should not be shared

In /Differences table you assign character code for each glyph name. Then in /CharProcs (for Type 3 font) you assign glyph definition for each glyph name. /CharProcs is of type PDF dictionary (page 421 in PDF Reference version 1.7). And it is undefined what happen if PDF dictionary contains one key two times (page 59). Basically glyph is identified by its name, not by character code, so two different character codes needs to have two different glyph names (if those characters code renders differently).

...

...
Function remove_duplicate_glyph_names() just remove duplicate glyph names from enc file and later function writet3() for glyph index uses either glyph name or if is not available (e.g. because of duplicates), then it use name "a" (like before). This ensures that every glyph has a unique name in PDF file.

If you comment that remove_duplicate_glyph_names() then you would see what happen. pdftex would not be able to create PDF file with two different glyphs with same name and would store just one glyph. That would result in damaged PDF font, one glyph would be used for all characters which had associated that one glyph name in enc file. Probably it would be the glyph with highest index.

...
Test case for reproducing should be easy:

File my.enc: ============ /my [ /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /mychar /mychar /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef ] def ============

Ok, but that is not related to pdf (as format)

It is related to PDF format, see above. And more details are in PDF Specification itself. In version 1.7 it is in section "5.5 Simple Fonts" starting at page 412.

...

but to a bad vector and/or pdftex not taking the right one ... is messing around with names (thereby obscuring the problem) better than fixing the enc file? After all, now one of the glyphs will still have the wrong name.

Basically there are two different things: 1) Glyph names 2) CMap encoding table In CMap table is mapping from the character code to Unicode (codepoint) sequence. And PDF viewers should use this mapping table to assign Unicode codepoint for particular glyph which render. But reality is that there are "not so good" PDF viewers which ignores CMap table stored in PDF file and do some mapping from glyph name to Unicode codepoint. It looks like that currently pdftex generates CMap from glyph names. Theoretically it should be possible to assign fully unique glyph names for every one glyph, possible fully random and then into CMap table put correct mapping for all character codes (as CMap table does not use glyph names) according to enc file. Correct PDF viewers which use CMap table will load character ==> Unicode mapping from CMap table. "not so good" PDF viewers stay broken.

...

...
File test.tex: ============ \pdfglyphtounicode{mychar}{269} \pdfgentounicode=1 \pdfmapline{cmb10
And result PDF file would not render glyph 'a' if function remove_duplicate_glyph_names() is disabled. There would be two glyphs 'b'.

-- Pali Rohár pali.rohar@gmail.com

Hans Hagen

18 Dec 18 Dec

11:17 a.m.

On 12/15/2017 9:27 PM, Pali Rohár wrote:

...

1) Glyph names

2) CMap encoding table

In CMap table is mapping from the character code to Unicode (codepoint) sequence. And PDF viewers should use this mapping table to assign Unicode codepoint for particular glyph which render.

But reality is that there are "not so good" PDF viewers which ignores CMap table stored in PDF file and do some mapping from glyph name to Unicode codepoint.

As type 1 can be mapped onto a wide font the glyph name is probably less an issue there so there most of the encoding data can be omitted. In cff 2 even less is needed. For copy paste the tounicode is needed and when absent glyph names play an (unreliable) role. My experience is that acrobat normally does things right (but has some weird limitations in the renderer), mupdf based viewers render perfect and do a reasonable cut and paste and that xpdf and friends are unreliable with cut and paste and have rendering issues too. So, when you create extra glyph names for type 3 they need to (somehow) obey the adobe logic (alpha.foo alongside alpha) as appending some number or character will spoil the cut and paste (depending on the viewer).

...

It looks like that currently pdftex generates CMap from glyph names. Theoretically it should be possible to assign fully unique glyph names for every one glyph, possible fully random and then into CMap table put correct mapping for all character codes (as CMap table does not use glyph names) according to enc file.

that would confuse some viewers too (i remember some thread about non standard ffi ligature names and resolving hard coded in some viewer and the request for tex related fonts to conform to that bad practice too)

...

Correct PDF viewers which use CMap table will load character ==> Unicode mapping from CMap table. "not so good" PDF viewers stay broken.

indeed, or worse: behave inconsistent over releases (which makes it hard to predict)

...

...
...
File test.tex: ============ \pdfglyphtounicode{mychar}{269} \pdfgentounicode=1 \pdfmapline{cmb10
And result PDF file would not render glyph 'a' if function remove_duplicate_glyph_names() is disabled. There would be two glyphs 'b'. but still i think that the fact that there are duplicate names in my.enc file is the real problem: if two b's refer to different shapes then what is the real 'b'? And what is the right new name: b.one, b.two ? What does one expect with cut and paste? If two names are the same and they refer to the same font program then there is no problem and the first one encountered when embedding should be used.

If remove duplicates is an option in pdftex then at least make sure that it's off by default (better complain loudly on the console that the enc is broken) so that the user knows that enabling that option is not solving the problem (and in tex distributions the fixed enc should be used). Heuristics and fixes for bugged fonts are nice but not being able to bypass them is bad. (multiple .notdef is an exception) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------

Pali Rohár

12:40 p.m.

On Monday 18 December 2017 11:17:45 Hans Hagen wrote:

...

...
It looks like that currently pdftex generates CMap from glyph names. Theoretically it should be possible to assign fully unique glyph names for every one glyph, possible fully random and then into CMap table put correct mapping for all character codes (as CMap table does not use glyph names) according to enc file.

that would confuse some viewers too (i remember some thread about non standard ffi ligature names and resolving hard coded in some viewer and the request for tex related fonts to conform to that bad practice too)

First occurrence of duplicate can use originally specified glyph name and second, third, ... occurrences can use newly unique glyph name (with proper CMap table). Yes, that would not fix problem for those "some" viewers but in this situation it is better then nothing.

...

...
...
...
File test.tex: ============ \pdfglyphtounicode{mychar}{269} \pdfgentounicode=1 \pdfmapline{cmb10
And result PDF file would not render glyph 'a' if function remove_duplicate_glyph_names() is disabled. There would be two glyphs 'b'. but still i think that the fact that there are duplicate names in my.enc file is the real problem: if two b's refer to different shapes then what is the real 'b'? And what is the right new name: b.one, b.two ?

If you have two shapes for b, then you can assign glyph name 'b' only just for one shape in final PDF. What you can do is to create CMap table where both characters would be mapped to unicode code point for 'b'. PDF viewers which do not use CMap would not be able to copy+paste properly. But this is current situation as /ToUnicode is not supported for Type3 fonts yet. Anyway, exactly same problem is for Type 1 fonts. If you have two different shapes for b in Type 1 font, then only one can have glyph name 'b'.

...

What does one expect with cut and paste?

The expected behavior for ordinary user is simple: Both glyphs which are marked as 'b' should be copied as character 'b'. It can work only in PDF viewers with correct CMap support. But with current pdftex code it is not possible. But you are right that this is a real problem. Some calligraphic fonts have more glyphs for one character. And decision which glyph needs to be used is based on previous or next characters.

...

If two names are the same and they refer to the same font program then there is no problem and the first one encountered when embedding should be used.

If remove duplicates is an option in pdftex then at least make sure that it's off by default (better complain loudly on the console that the enc is broken)

Do you want to be this problem a fatal error?

...

so that the user knows that enabling that option is not solving the problem (and in tex distributions the fixed enc should be used). Heuristics and fixes for bugged fonts are nice but not being able to bypass them is bad.

I thought it would be better to produce PDF file as enc file itself does not change how PDF file is rendered. It affects only copy+paste from PDF file.

...

(multiple .notdef is an exception)

Different, but maybe more interesting question is: What happens for other font formats if supplied enc file contains duplicate names? -- Pali Rohár pali.rohar@gmail.com

Hans Hagen

1:11 p.m.

On 12/18/2017 12:40 PM, Pali Rohár wrote:

...

On Monday 18 December 2017 11:17:45 Hans Hagen wrote:

...
...
It looks like that currently pdftex generates CMap from glyph names. Theoretically it should be possible to assign fully unique glyph names for every one glyph, possible fully random and then into CMap table put correct mapping for all character codes (as CMap table does not use glyph names) according to enc file.

that would confuse some viewers too (i remember some thread about non standard ffi ligature names and resolving hard coded in some viewer and the request for tex related fonts to conform to that bad practice too)

First occurrence of duplicate can use originally specified glyph name and second, third, ... occurrences can use newly unique glyph name (with proper CMap table). Yes, that would not fix problem for those "some" viewers but in this situation it is better then nothing.

Two 'same' names in an enc file not referring to the same glyph is a bugged enc file. Personally I would not use such a font.

...

...
...
...
...
File test.tex: ============ \pdfglyphtounicode{mychar}{269} \pdfgentounicode=1 \pdfmapline{cmb10
And result PDF file would not render glyph 'a' if function remove_duplicate_glyph_names() is disabled. There would be two glyphs 'b'. but still i think that the fact that there are duplicate names in my.enc file is the real problem: if two b's refer to different shapes then what is the real 'b'? And what is the right new name: b.one, b.two ?

If you have two shapes for b, then you can assign glyph name 'b' only just for one shape in final PDF. What you can do is to create CMap table where both characters would be mapped to unicode code point for 'b'.

in that case the enc file should have dollar and dollar.oldstyle or b and b.smallcaps i.e. a proper name, not something arbitrary

...

PDF viewers which do not use CMap would not be able to copy+paste properly. But this is current situation as /ToUnicode is not supported for Type3 fonts yet.

if one follows the adobe glyph name convention it should work ok (at least in acrobat, mupdf)

...

Anyway, exactly same problem is for Type 1 fonts. If you have two different shapes for b in Type 1 font, then only one can have glyph name 'b'.

i've never seen a type 1 font with two 'same names' for different shapes ... it would qualify as 'a font to avoid'

...

...
What does one expect with cut and paste?

The expected behavior for ordinary user is simple: Both glyphs which are marked as 'b' should be copied as character 'b'.

It can work only in PDF viewers with correct CMap support. But with current pdftex code it is not possible.

viewers can yuse the names instead

...

But you are right that this is a real problem. Some calligraphic fonts have more glyphs for one character. And decision which glyph needs to be used is based on previous or next characters.

then there's something a.varianta, a.variantb, a.variantc and a cut and paste will use the 'a' part to identity the name, just like f_f_i is a convention for a ligature

...

...
If two names are the same and they refer to the same font program then there is no problem and the first one encountered when embedding should be used.

If remove duplicates is an option in pdftex then at least make sure that it's off by default (better complain loudly on the console that the enc is broken)

Do you want to be this problem a fatal error?

Fatal in the sense that a viewer crashes? Sure. Then at least I know that the 'b' in a font is probably not a 'b'. Also, in that case it's a signal to avoid that font. (The same can be true for embedding fonts with bad font names that clash.) FYI: I decided (in context with luatex at least) to *not* use the fontloader but write one on lua that stays close to the original font and avoids the usual heuristics ... it's hard to fight (bad or fuzzy) heuristics as they obscure problems.

...

...
so that the user knows that enabling that option is not solving the problem (and in tex distributions the fixed enc should be used). Heuristics and fixes for bugged fonts are nice but not being able to bypass them is bad.

I thought it would be better to produce PDF file as enc file itself does not change how PDF file is rendered. It affects only copy+paste from PDF file.

But why not fix the enc file?

...

...
(multiple .notdef is an exception)

Different, but maybe more interesting question is: What happens for other font formats if supplied enc file contains duplicate names? I can only speak for luatex: we don't use enc files for type 1 and opentype. And even for type 3 (which i never use) I'd avoid them. In fact, everything related to encodings is already dealt with when the font is defined (loaded), and an afm or pfb file is normally ok. Makes me wonder how these bad enc files can show up at all, as those type 3 fonts are very old school and therefore the problem of duplicate names for different shaped should also have been seen with dvips and so.

Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------

Pali Rohár

1:37 p.m.

On Monday 18 December 2017 13:11:20 Hans Hagen wrote:

...

...
Anyway, exactly same problem is for Type 1 fonts. If you have two different shapes for b in Type 1 font, then only one can have glyph name 'b'.

i've never seen a type 1 font with two 'same names' for different shapes ... it would qualify as 'a font to avoid'

Yes, two 'same names' for different shapes is not something which is supported in Type1 (and maybe it is not possible... have not looked at specification in details). But... I mean this: how to handle situation if you create font in which there are two different shapes for character 'b' and you want to store this font in Type1 format? You need to choose something like b.variant1 and b.variant2...

...

...
...
What does one expect with cut and paste?

The expected behavior for ordinary user is simple: Both glyphs which are marked as 'b' should be copied as character 'b'.

It can work only in PDF viewers with correct CMap support. But with current pdftex code it is not possible.

viewers can yuse the names instead

...
But you are right that this is a real problem. Some calligraphic fonts have more glyphs for one character. And decision which glyph needs to be used is based on previous or next characters.

then there's something a.varianta, a.variantb, a.variantc and a cut and paste will use the 'a' part to identity the name, just like f_f_i is a convention for a ligature

...
...
If two names are the same and they refer to the same font program then there is no problem and the first one encountered when embedding should be used.

If remove duplicates is an option in pdftex then at least make sure that it's off by default (better complain loudly on the console that the enc is broken)

Do you want to be this problem a fatal error?

Fatal in the sense that a viewer crashes?

I mean in context of pdftex. What should pdftex do if its gets such enc file on input?

...

Sure. Then at least I know that the 'b' in a font is probably not a 'b'. Also, in that case it's a signal to avoid that font. (The same can be true for embedding fonts with bad font names that clash.)

FYI: I decided (in context with luatex at least) to *not* use the fontloader but write one on lua that stays close to the original font and avoids the usual heuristics ... it's hard to fight (bad or fuzzy) heuristics as they obscure problems.

...
...
so that the user knows that enabling that option is not solving the problem (and in tex distributions the fixed enc should be used). Heuristics and fixes for bugged fonts are nice but not being able to bypass them is bad.

I thought it would be better to produce PDF file as enc file itself does not change how PDF file is rendered. It affects only copy+paste from PDF file.

But why not fix the enc file?

Yes, proper way is to fix enc file. I think we have no doubts about it. But the whole discussion is... what should pdftex do if user puts such buggy enc file for particular font?

...

Makes me wonder how these bad enc files can show up at all, as those type 3 fonts are very old school and therefore the problem of duplicate names for different shaped should also have been seen with dvips and so.

Personally, I do not know how many enc files contains duplicates and even if they are any widely used. For pdftex patches, I specially prepared different enc files to test that pdftex with my patches does not crash (either print fatal warning or produce PDF) and that I always get PDF file rendered in same way as enc files must not affect how PDF file is rendered. And I observed problem when enc file contain one glyph name more times, therefore I added that code which deals with duplicates. -- Pali Rohár pali.rohar@gmail.com

Hans Hagen

3:46 p.m.

On 12/18/2017 1:37 PM, Pali Rohár wrote:

...

On Monday 18 December 2017 13:11:20 Hans Hagen wrote:

...
...
Anyway, exactly same problem is for Type 1 fonts. If you have two different shapes for b in Type 1 font, then only one can have glyph name 'b'.

i've never seen a type 1 font with two 'same names' for different shapes ... it would qualify as 'a font to avoid'

Yes, two 'same names' for different shapes is not something which is supported in Type1 (and maybe it is not possible... have not looked at specification in details).

But... I mean this: how to handle situation if you create font in which there are two different shapes for character 'b' and you want to store this font in Type1 format? You need to choose something like b.variant1 and b.variant2...

indeed, and given that both are named 'b' you can use that as prefix ... it doesn't matter what the suffix(es) are

...

I mean in context of pdftex. What should pdftex do if its gets such enc file on input?

probably what it does now (ok, it could complain that it has two b's but if these two b's refer to the same type 1 font program (glyph sub) then it's quite ok and one can remap the second one onto the first ... but actually one might wonder if the front end code should be fixed i.e. who can guarantee that the matching tfm file is ok? Two variants in a simple font (t3 / 8bit type1) are quite confusing anyway.

...

Yes, proper way is to fix enc file. I think we have no doubts about it.

But the whole discussion is... what should pdftex do if user puts such buggy enc file for particular font?

complain and quit

...

...
Makes me wonder how these bad enc files can show up at all, as those type 3 fonts are very old school and therefore the problem of duplicate names for different shaped should also have been seen with dvips and so.

Personally, I do not know how many enc files contains duplicates and even if they are any widely used.

ok, so actually there not a problem

...

For pdftex patches, I specially prepared different enc files to test that pdftex with my patches does not crash (either print fatal warning or produce PDF) and that I always get PDF file rendered in same way as enc files must not affect how PDF file is rendered. And I observed problem when enc file contain one glyph name more times, therefore I added that code which deals with duplicates.

well, if you apply that patch, then at least let pdftex complain very loud on the console and in the log that the resulting pdf is probably bugged (even with the patch) due to conflicts in the encoding (i don't mind that much about a patch because i don't use pdftex so i'm unlikely to be a victim of such a bad font .. so in the end it's karl who has to agree) Hans -- ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------

Pali Rohár

3:50 p.m.

On Monday 18 December 2017 15:46:13 Hans Hagen wrote:

...

...
For pdftex patches, I specially prepared different enc files to test that pdftex with my patches does not crash (either print fatal warning or produce PDF) and that I always get PDF file rendered in same way as enc files must not affect how PDF file is rendered. And I observed problem when enc file contain one glyph name more times, therefore I added that code which deals with duplicates.

well, if you apply that patch, then at least let pdftex complain very loud on the console and in the log that the resulting pdf is probably bugged (even with the patch) due to conflicts in the encoding

In my patch is pdftex_warn() which cause warning on console. Resulting PDF is not buggy as duplicate glyph names are replaced by "a<num>" (same glyph name as before my patch).

...

(i don't mind that much about a patch because i don't use pdftex so i'm unlikely to be a victim of such a bad font .. so in the end it's karl who has to agree)

Ok, so it is up to Karl now. Changing pdftex_warn() to fatal error is trivial if is better to disallow usage of such broken enc files. -- Pali Rohár pali.rohar@gmail.com

Karl Berry

2 Jan 2 Jan

1:18 a.m.

Pali, I've installed your patches into the pdftex (r790) and TeX Live (r46189) repositories. I only tweaked some of the wording in comments, doc, etc., and did not find a need to change any of your code. I expect Akira will be compiling a new pdftex with these changes for his w32tex distribution, so the changes will get some testing. Thanks for all your work and perseverance. --best, karl.

Pali Rohár

9:19 a.m.

On Tuesday 02 January 2018 00:18:48 Karl Berry wrote:

...

Pali, I've installed your patches into the pdftex (r790) and TeX Live (r46189) repositories. I only tweaked some of the wording in comments, doc, etc., and did not find a need to change any of your code.

I expect Akira will be compiling a new pdftex with these changes for his w32tex distribution, so the changes will get some testing.

Thanks for all your work and perseverance. --best, karl.

Great! If you find any other problem, let me know and I would try to fix them. -- Pali Rohár pali.rohar@gmail.com

2738

Age (days ago)

2874

Last active (days ago)

List overview

Download

14 comments

3 participants

participants (3)

Hans Hagen
Karl Berry
Pali Rohár

[PATCH v4] Allow .enc files for bitmap PK fonts

tags

participants (3)