[NTG-context] Accessibility and Tagged PDFs: Bugs and Feature Requests
Dr. Dominik Klein
Dominik.Klein at outlook.com
Sun Jun 28 12:59:26 CEST 2015
Context is the only Tex-based system that allows to properly tag a pdf.
Tagged PDFs are one major requirement for accessibility.
Indeed, in several large organizations/universities, accessibility is
mandated by law, and this is a major obstacle for using Tex. In practice
compliance is often assessed with Acrobat Pro's
Context produces a nice tag-structure, but there are some minor issues
that prevent compliance to , and hence Acrobat Pro complains during
the check. The main issues are:
1.) Elements that are not contained in the structure tree are not marked
as an artifact. Consider this example:
some more text here
Header, footer, pagenumber etc. will not be included in the tag
structure. Of course this makes absolutely sense and is correct, however
according to Section 188.8.131.52.2 of , then this content that is not in
the structure tree should be marked as an artifact, i.e.
or in an advanced way with /Artifact PropertyList where the type of
Artifact can be defined. It would be nice if those elements that are not
included in the tag tree would be marked as artifacts by default. The
same holds for \startelement[ignore] when one wants to explicitly remove
something from the structure tree.
2.) Images without alternate text:
According to Section 14.9.3 of , alternate descriptions in human
readable text should be provided for images. It would be really helpful,
if these could be defined in the source tex file, and then automatically
added when creating the object in the structure tree. I.e. it would be
nice to have something like:
\externalfigure[cow.pdf][width=10cm][alternate text = "This images shows
a beautiful cow."]
The same holds for formulas: Whereas the mathml-like tagging of Context
is very advanced, sometimes it might be still helpful to supply a
textual description (alt-text ="The definition of the Pythagorean
theorem: a^2 + b^2 = c^2")
3.) Tag names of the resulting tag structure:
Section 14.8.4 of  defines standard structure types, such as <H>,
<P>, <Sect> etc. Context creates a tag-tree that uses names directly
representing the structure names of the context laguage, such as
<sectiontitle>. This should however be mapped to something standard,
such as <H>. Interestingly these mappings seem to have been considered
in strc-tag.mkiv but I was unable to generate such a tagged pdf.
Editing/Outcommenting things in strc-tag.mkiv didn't work for me. It
would be nice if there was a switch somewhere, i.e.
\setuptagging[state=start,tagnames=pdf17] - or maybe I overlooked something?
4.) Acrobat Pro always complains that the language for the whole
document is not set.
The generated structure looks something like this:
Here, not only are the tag names non-compliant, also the tag structure
should distinguish between the table header (THead), and table rows
(TBody), c.f. Section 184.108.40.206.1 of . A simple heuristic would be
to always put the first line into THead tags, and the rest of the able
6.) It would be nice if a flat tag structure could be created
optionally. This is not a required feature according to , and in fact
a properly nested structure is surely preferable for the final output;
for debugging or checking during document creation however, a flat
structure tree sometimes is easier to browse through.
All in all, these seem to be the only issues that prevent accessible PDF
documents with context. For those within an organization where
accessibility is required legally for all publications, compliance to at
least Acrobat Pro's checks is a huge issue. I do not know how difficult
these things are to implement in Context (personally I am just lost in
the code), but looking at e.g. tex.stackexchange
for question related to accessibility, this is indeed a major obstacle
for several people.
 ISO 32000-1:2008, available at
More information about the ntg-context