Ligature handling for PDF searching.
(This came up on comp.text.tex in a question about LaTeX, but it also applies to ConTeXt, and the proposed solution for LaTeX doesn't apply.) Consider the following document: \starttext Some ligature tests: ff, fi, ffi, fl, ffl. \stoptext If I process that with texexex -pdf, load it into Acrobat 5, and then copy-and-paste the text from the PDF into a text editor, the fi and fl ligatures are correctly treated as two letters, but the ff, ffi, and ffl ligatures are treated as single (unknown) characters. Similarly, searching for "f" within the document only finds the fi and fl ligatures; it doesn't find the others. Searching for "ff" finds nothing. This is a fairly significant problem in the on-screen usability of ConTeXt-created documents. In LaTeX, there is apparently a solution in the cmap.sty package (though it currently only works for T1 encoding): http://www.ctan.org/tex-archive/macros/latex/contrib/cmap/ Is there a similar solution for ConTeXt? (Has this perhaps been solved with a later version of ConTeXt than I have on my computer?) Thanks, - Brooks
Brooks Moses wrote:
Is there a similar solution for ConTeXt? (Has this perhaps been solved with a later version of ConTeXt than I have on my computer?)
that kind of stuff was introduced in context ages ago -) take a look at: pdfr-il2 enco-pfr it's rather integrated and automatic although i didn't test it recently (probably last in fall 2000) the only thing needed is a pdfr-ec and pdfr-texnansi Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hi, Attached is pdfr-ec.tex. I don't really understand what is going on, so the texnansi version is out of my reach. Also, I cannot/will not test because AR7 has no problem with ffi anyway. Taco Hans Hagen wrote:
Brooks Moses wrote:
Is there a similar solution for ConTeXt? (Has this perhaps been solved with a later version of ConTeXt than I have on my computer?)
that kind of stuff was introduced in context ages ago -)
take a look at:
pdfr-il2 enco-pfr
it's rather integrated and automatic although i didn't test it recently (probably last in fall 2000) the only thing needed is a pdfr-ec and pdfr-texnansi Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
_______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
At 01:25 AM 7/27/2005, you wrote:
Attached is pdfr-ec.tex. I don't really understand what is going on, so the texnansi version is out of my reach. Also, I cannot/will not test because AR7 has no problem with ffi anyway.
I'm perfectly glad to test this, but I'm not at all sure how to use it. What do I need to do to use it? Thanks! - Brooks
I'm guessing: \input enco-pfr \startencoding [ec] \usepdffontresource ec \stopencoding \starttext fi ff ffi \stoptext (at least this loads pdfr-ec.tex) Taco Brooks Moses wrote:
At 01:25 AM 7/27/2005, you wrote:
Attached is pdfr-ec.tex. I don't really understand what is going on, so the texnansi version is out of my reach. Also, I cannot/will not test because AR7 has no problem with ffi anyway.
I'm perfectly glad to test this, but I'm not at all sure how to use it. What do I need to do to use it?
Thanks! - Brooks
_______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Taco Hoekwater wrote:
I'm guessing:
\input enco-pfr \startencoding [ec] \usepdffontresource ec \stopencoding \starttext fi ff ffi \stoptext
(at least this loads pdfr-ec.tex)
Taco
it's hard to check with compressed files, but: \pdfcompresslevel=0 \useencoding[pfr] \startencoding [ec] \usepdffontresource ec \stopencoding \usetypescript[palatino][ec] \setupbodyfont[palatino] \starttext fi ff ffi \stoptext seems to work here; i'll add the file and definition to the distribution Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Brooks Moses wrote:
(This came up on comp.text.tex in a question about LaTeX, but it also applies to ConTeXt, and the proposed solution for LaTeX doesn't apply.)
Consider the following document:
\starttext Some ligature tests: ff, fi, ffi, fl, ffl. \stoptext
If I process that with texexex -pdf, load it into Acrobat 5, and then copy-and-paste the text from the PDF into a text editor, the fi and fl ligatures are correctly treated as two letters, but the ff, ffi, and ffl ligatures are treated as single (unknown) characters. Similarly, searching for "f" within the document only finds the fi and fl ligatures; it doesn't find the others. Searching for "ff" finds nothing.
This is a fairly significant problem in the on-screen usability of ConTeXt-created documents.
In LaTeX, there is apparently a solution in the cmap.sty package (though it currently only works for T1 encoding): http://www.ctan.org/tex-archive/macros/latex/contrib/cmap/
Is there a similar solution for ConTeXt? (Has this perhaps been solved with a later version of ConTeXt than I have on my computer?)
Yes, but IFAIK only for one or two encodings (CMAP files). I have to remember ... the keyword is \usepdffontresource. See source enco-pfr.tex for more info. vit
participants (4)
-
Brooks Moses
-
Hans Hagen
-
Taco Hoekwater
-
Vit Zyka