How to make words searchable without diacritics
Dear List, I have a lot of latin words in a document with the length of the vowels indicated by diacritics, for example: fīlĭa. Is it possible somehow to make these words searchable without the diacritics? That is, if I make a search for filia in the final pdf file, fīlĭa would also be found? Regards Marcus Vinicius -- Todas as coisas fatigam o corpo, salvo a música, que não fatiga nem o corpo nem seus membros, por ser descanso da alma, primavera do coração, distração do aflito, entretenimento do solitário, e viático do viajante. Kunnâsh al-Hâ'ik (Cancioneiro de al-Hâ'ik)
In Adobe Reader there is an option Preferences › Categories › Search › [ ] Ignore Diacritics and Accents which you can tick to search on the underlying letter only. If the search is for your own use only then this might be a solution rather than change the generated PDF.
On 5 Aug 2023, at 20:16, Marcus Vinicius Mesquita
wrote: Dear List,
I have a lot of latin words in a document with the length of the vowels indicated by diacritics, for example: fīlĭa.
Is it possible somehow to make these words searchable without the diacritics? That is, if I make a search for filia in the final pdf file, fīlĭa would also be found?
— Bruce Horrocks Hampshire, UK
On 8/5/23 21:16, Marcus Vinicius Mesquita wrote:
Dear List,
I have a lot of latin words in a document with the length of the vowels indicated by diacritics, for example: fīlĭa.
Is it possible somehow to make these words searchable without the diacritics? That is, if I make a search for filia in the final pdf file, fīlĭa would also be found?
Dear Marcus Vinicius, in PDF (the format itself), ActualText is a way of providing a text replacement for the displayed element. If you use ActualText, the string you search is the text replacement you provide. That way, you could find literally “whatever you want” (being "filia" its ActualText). Hans provides this jewel in back-imp-pdf.mkxl and back-pdf.mkiv (adapter for your needs): \starttext text \pdfbackendactualtext{whatever you want}{filia} text \stoptext That being said, I think this is the wrong approach to your issue. Firefox also disables diacritics by default (at least for me, this is not a minor issue). In any case, the PDF viewer used to search must have ActualText implemented. I hope it helps, Pablo
Am 06.08.23 um 20:37 schrieb Pablo Rodriguez:
Hans provides this jewel in back-imp-pdf.mkxl and back-pdf.mkiv (adapter for your needs):
\starttext text \pdfbackendactualtext{whatever you want}{filia} text \stoptext
In any case, the PDF viewer used to search must have ActualText implemented.
Exactly. And e.g. Apple’s PDF library has not; it is used not only by Preview.app, but also by Skim and TeXshop. (I should check this with other viewers/libs.) Hraban
Thank you for the answers, Bruce, Pablo and Hraban. I was not aware of
ActualText.
I work on a manjaro linux, and I tested the example Pablo sent on
several programs:
mupdf-gl or mupdf: fails! [mupdf-gl is what I customarily use for its
blazing speed]
firefox: fails
vivaldi: passes
okular: passes
qpdfview: passes
evince: passes
But \pdfbackendactualtext is actually just what I needed since it can
be used also for other things like:
\starttext
what a \pdfbackendactualtext{\hyphenatedword{wonderful}}{wonderful} text
\stoptext
Best regards
Marcus Vinicius
On Mon, Aug 7, 2023 at 3:13 AM Henning Hraban Ramm
Am 06.08.23 um 20:37 schrieb Pablo Rodriguez:
Hans provides this jewel in back-imp-pdf.mkxl and back-pdf.mkiv (adapter for your needs):
\starttext text \pdfbackendactualtext{whatever you want}{filia} text \stoptext
In any case, the PDF viewer used to search must have ActualText implemented.
Exactly. And e.g. Apple’s PDF library has not; it is used not only by Preview.app, but also by Skim and TeXshop. (I should check this with other viewers/libs.)
Hraban
___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context webpage : https://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : https://contextgarden.net ___________________________________________________________________________________
-- Todas as coisas fatigam o corpo, salvo a música, que não fatiga nem o corpo nem seus membros, por ser descanso da alma, primavera do coração, distração do aflito, entretenimento do solitário, e viático do viajante. Kunnâsh al-Hâ'ik (Cancioneiro de al-Hâ'ik)
Am Mon, 7 Aug 2023 09:17:19 -0300 schrieb Marcus Vinicius Mesquita:
Thank you for the answers, Bruce, Pablo and Hraban. I was not aware of ActualText.
But \pdfbackendactualtext is actually just what I needed since it can be used also for other things like:
I don't think that it would be a good idea to use ActualText for this. You are effectivly changing the content and meaning of your text, not only for search, but also for copy&paste, screen reader, html export etc. If you think it is okay to claim that the text is filia and then the accents are irrelevant, then why don't you print filia directly? -- Ulrike Fischer http://www.troubleshooting-tex.de/
Am 07.08.23 um 14:17 schrieb Marcus Vinicius Mesquita:
Thank you for the answers, Bruce, Pablo and Hraban. I was not aware of ActualText.
I work on a manjaro linux, and I tested the example Pablo sent on several programs:
mupdf-gl or mupdf: fails! [mupdf-gl is what I customarily use for its blazing speed] firefox: fails vivaldi: passes okular: passes qpdfview: passes evince: passes
Thank you for researching! I’ll include this in my viewer matrix. (But probably not before the ConTeXt meeting.)
But \pdfbackendactualtext is actually just what I needed since it can be used also for other things like:
\starttext what a \pdfbackendactualtext{\hyphenatedword{wonderful}}{wonderful} text \stoptext
I’m not sure but I’d guess ActualText is also suitable for alternative texts (AltText) of images? Wouldn’t it make sense to have an alttext key in \externalfigure for accessibility (PDF/UA)? Hraban
@ Ulrike: This is what my client wants, and the client is always right.
Regards
Marcus Vinicius
On Mon, Aug 7, 2023 at 2:23 PM Henning Hraban Ramm
Am 07.08.23 um 14:17 schrieb Marcus Vinicius Mesquita:
Thank you for the answers, Bruce, Pablo and Hraban. I was not aware of ActualText.
I work on a manjaro linux, and I tested the example Pablo sent on several programs:
mupdf-gl or mupdf: fails! [mupdf-gl is what I customarily use for its blazing speed] firefox: fails vivaldi: passes okular: passes qpdfview: passes evince: passes
Thank you for researching! I’ll include this in my viewer matrix. (But probably not before the ConTeXt meeting.)
But \pdfbackendactualtext is actually just what I needed since it can be used also for other things like:
\starttext what a \pdfbackendactualtext{\hyphenatedword{wonderful}}{wonderful} text \stoptext
I’m not sure but I’d guess ActualText is also suitable for alternative texts (AltText) of images? Wouldn’t it make sense to have an alttext key in \externalfigure for accessibility (PDF/UA)?
Hraban ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context webpage : https://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : https://contextgarden.net ___________________________________________________________________________________
-- Todas as coisas fatigam o corpo, salvo a música, que não fatiga nem o corpo nem seus membros, por ser descanso da alma, primavera do coração, distração do aflito, entretenimento do solitário, e viático do viajante. Kunnâsh al-Hâ'ik (Cancioneiro de al-Hâ'ik)
On 8/7/2023 8:58 PM, Marcus Vinicius Mesquita wrote:
@ Ulrike: This is what my client wants, and the client is always right. You can try this:
\starttext \protected\def\ProofOfConcept#1#2% {{#1\llap{\effect[hidden]{#2}}}} test test \ProofOfConcept{föö}{foo} test \stoptext but forget about hyphenation (actualtext probably also doesn't always work well across lines in viewers). Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
This is perfect, as it works also with mupdf-gl and firefox!
Thank you, Hans
Kind regards
Marcus Vinicius
On Mon, Aug 7, 2023 at 4:58 PM Hans Hagen
On 8/7/2023 8:58 PM, Marcus Vinicius Mesquita wrote:
@ Ulrike: This is what my client wants, and the client is always right. You can try this:
\starttext
\protected\def\ProofOfConcept#1#2% {{#1\llap{\effect[hidden]{#2}}}}
test test \ProofOfConcept{föö}{foo} test
\stoptext
but forget about hyphenation (actualtext probably also doesn't always work well across lines in viewers).
Hans
----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context webpage : https://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : https://contextgarden.net ___________________________________________________________________________________
-- Todas as coisas fatigam o corpo, salvo a música, que não fatiga nem o corpo nem seus membros, por ser descanso da alma, primavera do coração, distração do aflito, entretenimento do solitário, e viático do viajante. Kunnâsh al-Hâ'ik (Cancioneiro de al-Hâ'ik)
participants (6)
-
Bruce Horrocks
-
Hans Hagen
-
Henning Hraban Ramm
-
Marcus Vinicius Mesquita
-
Pablo Rodriguez
-
Ulrike Fischer