[NTG-context] ignore not closed tags in XML input

Pablo Rodriguez oinos at gmx.es
Sat May 21 19:01:54 CEST 2022


On 5/18/22 19:14, Thangalin via ntg-context wrote:
> Hey Pablo,
>
>> One of the not irrelevant tasks for me is finding examples of XML code.
>
> To clarify, XHTML documents /are/ XML documents. XHTML happens to use a
> standardized set of XML element and attribute names. All XHTML examples
> are also XML examples.

Hi Dave,

many thanks for the explanation.

>> But my worries came from having to sanitize HTML sources (which aren’t
>
> That was discussed in the blog post: finding a source of well-formed
> XHTML documents. There are a number of tools to sanitize HTML, as
> mentioned in the thread. KeenWrite uses the Java-based JSoup library
> https://jsoup.org/ <https://jsoup.org/> to sanitize HTML and then create
> an XHTML version.

After dealing with other (X)HTML sources, I have experienced that not
few of them contain sloppy encoded data (as Taco pointed out).

There are even some mismatches that xmllint doesn’t solve automatically
(as Taco already mentioned too).

Now I understand that I will have also to curate tidy XML sources to
typeset them with ConTeXt.

Many thanks for your help again,

Pablo


More information about the ntg-context mailing list