searchable PDF with MinionPro under mkiv
How can I generate a searchable PDF with mkiv, using a non standard font like MinionPro? \definefontfeature [default] [default] [mode=node,script=latn,onum=yes] \usemodule[simplefonts] \setmainfont[minionpro] \starttext fi ff ffi ffl 1234567890 \stoptext Using pdftotext, I get this: fi ff ffi ffl However, using Adobe Reader this things won't be found. It should read: fi ff ffi ffl 1234567890 Using latex, one would use \input glyphtounicode.tex \pdfgentounicode=1, but this doesn't seem to work with context. Context used pdfr-def, but this seems to be mkii-only. TIA, olli -- Oliver Heins heins@sopos.org http://www.sopos.org/olli GPG: F27A BA8C 1CFB B905 65A8 2544 0F07 B675 9A00 D827 1024D/9A00D827 2004-09-24 -- gpg --recv-keys 0x9A00D827 Please avoid sending me Word or PowerPoint attachments: http://www.gnu.org/philosophy/no-word-attachments.html
2011/1/17 Oliver Heins
How can I generate a searchable PDF with mkiv, using a non standard font like MinionPro?
\definefontfeature [default] [default] [mode=node,script=latn,onum=yes] \usemodule[simplefonts] \setmainfont[minionpro]
\starttext fi ff ffi ffl 1234567890 \stoptext
Using pdftotext, I get this:
fi ff ffi ffl
However, using Adobe Reader this things won't be found. It should read:
fi ff ffi ffl 1234567890
Using latex, one would use \input glyphtounicode.tex \pdfgentounicode=1, but this doesn't seem to work with context. Context used pdfr-def, but this seems to be mkii-only.
Hi Oliver, Your example works for me with the beta 2011.01.14 and pdftotext-0.16.0. Which version is your ConTeXt MkIV? Your problem looks like "http://www.ntg.nl/pipermail/ntg-context/2010/052259.html" but that one has been solved by Taco. -- Best regards, Li Yanrui (李延瑞)
"Li Yanrui (李延瑞)"
Your example works for me with the beta 2011.01.14 and pdftotext-0.16.0.
Which version is your ConTeXt MkIV? Your problem looks like "http://www.ntg.nl/pipermail/ntg-context/2010/052259.html" but that one has been solved by Taco.
Hi Li, ConTeXt ver: 2011.01.12 10:20 MKIV fmt: 2011.1.12 This is a quite recent version, however I updated my minimals, so now I have: ConTeXt ver: 2011.01.14 14:44 MKIV fmt: 2011.1.17 The result stays the same. My pdftotext is an older version (0.12.4), but that shouldn't be a problem. Adobe reader, evince and xpdf are able to find the ligatures, but not the numbers. okular even fails to find the ligatures, but I would consider this a bug in okular. Best regards, olli -- Oliver Heins heins@sopos.org http://www.sopos.org/olli GPG: F27A BA8C 1CFB B905 65A8 2544 0F07 B675 9A00 D827 1024D/9A00D827 2004-09-24 -- gpg --recv-keys 0x9A00D827 Please avoid sending me Word or PowerPoint attachments: http://www.gnu.org/philosophy/no-word-attachments.html
How can I generate a searchable PDF with mkiv, using a non standard font like MinionPro?
\definefontfeature [default] [default] [mode=node,script=latn,onum=yes] \usemodule[simplefonts] \setmainfont[minionpro]
\starttext fi ff ffi ffl 1234567890 \stoptext
Using pdftotext, I get this:
fi ff ffi ffl
Hi Oliver, it works for me with the beta 2011.01.12 and 2011.01.14 and poppler-0.14.5/ poppler-0.16.0. However, it turns out that pdftotext converts to fi ff ffi ffl 1234567890, splitting fi ligature while leaving ff, ffi and ffl intact, which is strange. I did not try with Adobe Reader but the pdf is searchable with Apple Preview and the pasted copy is still intact: fi ff ffi ffl 1234567890 Florian
Hi Florian,
Florian Wobbe
it works for me with the beta 2011.01.12 and 2011.01.14 and poppler-0.14.5/ poppler-0.16.0.
However, it turns out that pdftotext converts to
fi ff ffi ffl 1234567890,
splitting fi ligature while leaving ff, ffi and ffl intact, which is strange.
I did not try with Adobe Reader but the pdf is searchable with Apple Preview and the pasted copy is still intact:
fi ff ffi ffl 1234567890
For me, it still doesn't work. I get oldstyle numbers in the text, and neither in Adobe Reader nor in okular, evince or xpdf the numbers are searchable. However, I figured out that it is my version of the font causing the wrong result. $ otfinfo -i /usr/local/share/fonts/MinionPro_Regular.otf Family: Minion Pro Subfamily: Regular Full name: Minion Pro PostScript name: MinionPro-Regular Version: OTF 1.011;PS 001.000;Core 1.0.27;makeotf.lib1.3.1 Unique ID: 1.011;ADBE;MinionPro-Regular Designer: Robert Slimbach Vendor URL: http://www.adobe.com/type/ Trademark: Minion is either a registered trademark or a trademark of Adobe Systems Incorporated in the United States and/or other countries. Copyright: © 2000 Adobe Systems Incorporated. All Rights Reserved. U.S. Patent Des. 337,604. Other patents pending. License URL: http://www.adobe.com/type/legal.html When using the MinionPro fonts shipped with Adobe reader, I get the same results as you: $ otfinfo -i /usr/local/share/fonts/MinionPro-Regular.otf Family: Minion Pro Subfamily: Regular Full name: Minion Pro PostScript name: MinionPro-Regular Version: Version 2.068;PS 2.000;hotconv 1.0.57; makeotf.lib2.0.21895 Unique ID: 2.068;ADBE;MinionPro-Regular Designer: Robert Slimbach Manufacturer: Adobe Systems Incorporated Vendor URL: http://www.adobe.com/type/ Trademark: Minion is either a registered trademark or a trademark of Adobe Systems Incorporated in the United States and/or other countries. Copyright: © 1990, 1991, 1992, 1994, 1997, 1998, 2000, 2002, 2004 Adobe Systems Incorporated. All rights reserved. License URL: http://www.adobe.com/type/legal.html Has this to be consired a bug in the font? Best regards, olli -- Oliver Heins heins@sopos.org http://www.sopos.org/olli GPG: F27A BA8C 1CFB B905 65A8 2544 0F07 B675 9A00 D827 1024D/9A00D827 2004-09-24 -- gpg --recv-keys 0x9A00D827 Please avoid sending me Word or PowerPoint attachments: http://www.gnu.org/philosophy/no-word-attachments.html
However, it turns out that pdftotext converts to
fi ff ffi ffl 1234567890,
splitting fi ligature while leaving ff, ffi and ffl intact, which is strange.
I did not try with Adobe Reader but the pdf is searchable with Apple Preview and the pasted copy is still intact:
fi ff ffi ffl 1234567890
For me, it still doesn't work. I get oldstyle numbers in the text, and neither in Adobe Reader nor in okular, evince or xpdf the numbers are searchable. However, I figured out that it is my version of the font causing the wrong result.
You are right! I have not considered that. Depending on the used font, pdftotext expands (some) the ligatures or not. With TeXGyre Pagella for instance there is no ligature expansion at all: fi ff ffi ffl 1234567890 and with Cambria I get a pdf which is not searchable with Preview: i ff fi fl 1234567890 Florian
participants (3)
-
Florian Wobbe
-
Li Yanrui (李延瑞)
-
Oliver Heins