Reading tounicode from shared fonts
Hi, While debugging an issue with PDF /ToUnicode, I found code for reading character-level tounicode from the fonts, always takes them from the first font in fonts in merges in the PDF. Usually this is not an issue since the font loader will have identical tounicode values for all characters in the same font loaded multiple times. However, my code sets character-level tounicode only after processing the nodes (to avoid parsing the cmap and GSUB tables ahead of typesetting), so the same font loaded multiple times can have different tounicodes and characters used only in later instances of the font will not have their tounicodes in the PDF file. It seems to be one character fix to instead check the font the character is used in, and looks to me like a typo since the first revision this code was introduced (r710) that went unnoticed. Patch attached. Regards, Khaled
On Wed, Nov 14, 2018 at 6:23 PM Khaled Hosny
Hi,
While debugging an issue with PDF /ToUnicode, I found code for reading character-level tounicode from the fonts, always takes them from the first font in fonts in merges in the PDF. Usually this is not an issue since the font loader will have identical tounicode values for all characters in the same font loaded multiple times. However, my code sets character-level tounicode only after processing the nodes (to avoid parsing the cmap and GSUB tables ahead of typesetting), so the same font loaded multiple times can have different tounicodes and characters used only in later instances of the font will not have their tounicodes in the PDF file.
It seems to be one character fix to instead check the font the character is used in, and looks to me like a typo since the first revision this code was introduced (r710) that went unnoticed. Patch attached.
Thank you for the report, we will see it asap. -- luigi
participants (2)
-
Khaled Hosny
-
luigi scarso