On 16 May 2022, at 18:50, Pablo Rodriguez via ntg-context
wrote: On 5/16/22 17:30, Hans van der Meer via ntg-context wrote:
Can't you use an editor with grep, searching for something like the pattern
? Many thanks for your reply, dr. van der Meer.
If I want to typeset the whole book (https://seumasjeltzz.github.io/LinguaeGraecaePerSeIllustrata/), I will have to download and sanitize over 20 HTML files.
Which can be done with a couple of command lines. Xmllint usually does a good
job of cleaning up dodgy html input:
xmllint --html --xmlout
It is really a pity that ConTeXt cannot totally ignore any given XML elements.
This statement is a little unfair: the problem is exactly that your input is NOT proper XML. If it was proper XML, ConTeXt would not have problems with it. ConTeXt explicitly has the capability to handle XML files, which your input simply is not. In fact, it is sloppy HTML-esque data that modern webbrowsers happen to be able to handle more or less correctly. It is not valid HTML either, because valid HTML has to be valid SGML, which your input clearly is not. That said, Tools like xmllint exist for this stuff. Just write a small batch driver file in some scripting language ((power)shell, lua, python, perl, etc.) to preprocess the HTML stuff into clean XML, and you should be fine. Taco — Taco Hoekwater E: taco@bittext.nl genderfluid (all pronouns)