Accessibility and Tagged PDFs: Bugs and Feature Requests
Context is the only Tex-based system that allows to properly tag a pdf. Tagged PDFs are one major requirement for accessibility. Indeed, in several large organizations/universities, accessibility is mandated by law, and this is a major obstacle for using Tex. In practice compliance is often assessed with Acrobat Pro's accessibility checker. Context produces a nice tag-structure, but there are some minor issues that prevent compliance to [1], and hence Acrobat Pro complains during the check. The main issues are: 1.) Elements that are not contained in the structure tree are not marked as an artifact. Consider this example: ------------------------------- \setuptagging[state=start] \setuppagenumbering [location=, alternative=doublesided] \setupheadertexts [{Chapter~\getmarking[chapternumber]\hskip1em\getmarking[chapter]}] [{Header Right}] [{Header Left}] [{Chapter~\getmarking[chapternumber]\hskip1em\getmarking[chapter]}] \setupfootertexts [Organization Name] [pagenumber] [pagenumber] [Organization Name] \starttext \startfrontmatter something \stopfrontmatter \startbodymatter some more text here \stopbodymatter \stoptext ------------------------------- Header, footer, pagenumber etc. will not be included in the tag structure. Of course this makes absolutely sense and is correct, however according to Section 14.8.2.2.2 of [1], then this content that is not in the structure tree should be marked as an artifact, i.e. /Artifact BMC .. EMC or in an advanced way with /Artifact PropertyList where the type of Artifact can be defined. It would be nice if those elements that are not included in the tag tree would be marked as artifacts by default. The same holds for \startelement[ignore] when one wants to explicitly remove something from the structure tree. 2.) Images without alternate text: According to Section 14.9.3 of [1], alternate descriptions in human readable text should be provided for images. It would be really helpful, if these could be defined in the source tex file, and then automatically added when creating the object in the structure tree. I.e. it would be nice to have something like: \placefigure[top][Image Reference]{Caption}{ \externalfigure[cow.pdf][width=10cm][alternate text = "This images shows a beautiful cow."] } The same holds for formulas: Whereas the mathml-like tagging of Context is very advanced, sometimes it might be still helpful to supply a textual description (alt-text ="The definition of the Pythagorean theorem: a^2 + b^2 = c^2") 3.) Tag names of the resulting tag structure: Section 14.8.4 of [1] defines standard structure types, such as <H>, <P>, <Sect> etc. Context creates a tag-tree that uses names directly representing the structure names of the context laguage, such as <sectiontitle>. This should however be mapped to something standard, such as <H>. Interestingly these mappings seem to have been considered in strc-tag.mkiv but I was unable to generate such a tagged pdf. Editing/Outcommenting things in strc-tag.mkiv didn't work for me. It would be nice if there was a switch somewhere, i.e. \setuptagging[state=start,tagnames=pdf17] - or maybe I overlooked something? 4.) Acrobat Pro always complains that the language for the whole document is not set. 5.) Tables The generated structure looks something like this: <table> <tablerow> <tablecell> ... <tablerow> <tablecell> ... Here, not only are the tag names non-compliant, also the tag structure should distinguish between the table header (THead), and table rows (TBody), c.f. Section 14.8.4.3.1 of [1]. A simple heuristic would be to always put the first line into THead tags, and the rest of the able into TBody. 6.) It would be nice if a flat tag structure could be created optionally. This is not a required feature according to [1], and in fact a properly nested structure is surely preferable for the final output; for debugging or checking during document creation however, a flat structure tree sometimes is easier to browse through. All in all, these seem to be the only issues that prevent accessible PDF documents with context. For those within an organization where accessibility is required legally for all publications, compliance to at least Acrobat Pro's checks is a huge issue. I do not know how difficult these things are to implement in Context (personally I am just lost in the code), but looking at e.g. tex.stackexchange for question related to accessibility, this is indeed a major obstacle for several people. cheers - Dominik [1] ISO 32000-1:2008, available at http://www.adobe.com/devnet/pdf/pdf_reference.html
On Sun, Jun 28, 2015 at 12:59 PM, Dr. Dominik Klein < Dominik.Klein@outlook.com> wrote:
Context is the only Tex-based system that allows to properly tag a pdf. Tagged PDFs are one major requirement for accessibility.
Indeed, in several large organizations/universities, accessibility is mandated by law, and this is a major obstacle for using Tex. In practice compliance is often assessed with Acrobat Pro's accessibility checker.
Context produces a nice tag-structure, but there are some minor issues that prevent compliance to [1], and hence Acrobat Pro complains during the check. The main issues are:
1.) Elements that are not contained in the structure tree are not marked as an artifact. Consider this example:
------------------------------- \setuptagging[state=start]
\setuppagenumbering [location=, alternative=doublesided]
\setupheadertexts [{Chapter~\getmarking[chapternumber]\hskip1em\getmarking[chapter]}] [{Header Right}] [{Header Left}] [{Chapter~\getmarking[chapternumber]\hskip1em\getmarking[chapter]}]
\setupfootertexts [Organization Name] [pagenumber] [pagenumber] [Organization Name]
\starttext \startfrontmatter something \stopfrontmatter
\startbodymatter some more text here \stopbodymatter \stoptext -------------------------------
Header, footer, pagenumber etc. will not be included in the tag structure. Of course this makes absolutely sense and is correct, however according to Section 14.8.2.2.2 of [1], then this content that is not in the structure tree should be marked as an artifact, i.e.
/Artifact BMC .. EMC
or in an advanced way with /Artifact PropertyList where the type of Artifact can be defined. It would be nice if those elements that are not included in the tag tree would be marked as artifacts by default. The same holds for \startelement[ignore] when one wants to explicitly remove something from the structure tree.
2.) Images without alternate text: According to Section 14.9.3 of [1], alternate descriptions in human readable text should be provided for images. It would be really helpful, if these could be defined in the source tex file, and then automatically added when creating the object in the structure tree. I.e. it would be nice to have something like:
\placefigure[top][Image Reference]{Caption}{ \externalfigure[cow.pdf][width=10cm][alternate text = "This images shows a beautiful cow."] }
The same holds for formulas: Whereas the mathml-like tagging of Context is very advanced, sometimes it might be still helpful to supply a textual description (alt-text ="The definition of the Pythagorean theorem: a^2 + b^2 = c^2")
3.) Tag names of the resulting tag structure: Section 14.8.4 of [1] defines standard structure types, such as <H>, <P>, <Sect> etc. Context creates a tag-tree that uses names directly representing the structure names of the context laguage, such as <sectiontitle>. This should however be mapped to something standard, such as <H>. Interestingly these mappings seem to have been considered in strc-tag.mkiv but I was unable to generate such a tagged pdf. Editing/Outcommenting things in strc-tag.mkiv didn't work for me. It would be nice if there was a switch somewhere, i.e. \setuptagging[state=start,tagnames=pdf17] - or maybe I overlooked something?
4.) Acrobat Pro always complains that the language for the whole document is not set.
5.) Tables The generated structure looks something like this: <table> <tablerow> <tablecell> ... <tablerow> <tablecell> ...
Here, not only are the tag names non-compliant, also the tag structure should distinguish between the table header (THead), and table rows (TBody), c.f. Section 14.8.4.3.1 of [1]. A simple heuristic would be to always put the first line into THead tags, and the rest of the able into TBody.
6.) It would be nice if a flat tag structure could be created optionally. This is not a required feature according to [1], and in fact a properly nested structure is surely preferable for the final output; for debugging or checking during document creation however, a flat structure tree sometimes is easier to browse through.
All in all, these seem to be the only issues that prevent accessible PDF documents with context. For those within an organization where accessibility is required legally for all publications, compliance to at least Acrobat Pro's checks is a huge issue. I do not know how difficult these things are to implement in Context (personally I am just lost in the code), but looking at e.g. tex.stackexchange for question related to accessibility, this is indeed a major obstacle for several people.
cheers
- Dominik
[1] ISO 32000-1:2008, available at http://www.adobe.com/devnet/pdf/pdf_reference.html
___________________________________________________________________________________
Thank you for the report . It would be nice to have a pdf made by context using \nopdfcompression that have all these issues together with the report emitted by acrobat. Last time I have checked a pfd/a-1a made by context it was all ok, but it was time ago and maybe not all the features were tested deeply. -- luigi
On Sun, 28 Jun 2015 12:59:26 +0200
"Dr. Dominik Klein"
2.) Images without alternate text: According to Section 14.9.3 of [1], alternate descriptions in human readable text should be provided for images. It would be really helpful, if these could be defined in the source tex file, and then automatically added when creating the object in the structure tree. I.e. it would be nice to have something like:
\placefigure[top][Image Reference]{Caption}{ \externalfigure[cow.pdf][width=10cm][alternate text = "This images shows a beautiful cow."] }
Maybe the syntax could be: \externalfigure [cow] [width=10cm,marking={This image shows a beautiful cow}] (conforming to ConTeXt style) Alan
on Tue Jun 30 10:32:29 CEST 2015 luigi scarso wrote:
It would be nice to have a pdf made by context using \nopdfcompression that have all these issues together with the report emitted by acrobat. Nice idea. The document: https://github.com/asdfjkl/tex-access/blob/master/document_acc.tex Resulting pdf (with \nocompression): https://github.com/asdfjkl/tex-access/blob/master/document_acc.pdf Report of Acrobat 9 Pro (Menu Advanced -> Accessibility -> Full Check...) https://github.com/asdfjkl/tex-access/blob/master/document_acc_AdobePDF.html
Note that my goal was not to achieve compatibility w.r.t. pdf/a, I solely focused on accessibility (even though they may be related). Could be very well that I overlooked something, and some functionality is already there with context... cheers - Dominik
On Tue, Jun 30, 2015 at 5:58 PM, Dominik Klein
on Tue Jun 30 10:32:29 CEST 2015 luigi scarso wrote:
It would be nice to have a pdf made by context using \nopdfcompression that have all these issues together with the report emitted by acrobat.
Nice idea. The document: https://github.com/asdfjkl/tex-access/blob/master/document_acc.tex Resulting pdf (with \nocompression): https://github.com/asdfjkl/tex-access/blob/master/document_acc.pdf Report of Acrobat 9 Pro (Menu Advanced -> Accessibility -> Full Check...)
https://github.com/asdfjkl/tex-access/blob/master/document_acc_AdobePDF.html
Note that my goal was not to achieve compatibility w.r.t. pdf/a, I solely focused on accessibility (even though they may be related). Could be very well that I overlooked something, and some functionality is already there with context...
nice, thank you very much. -- luigi
On 6/28/2015 12:59 PM, Dr. Dominik Klein wrote:
2.) Images without alternate text: According to Section 14.9.3 of [1], alternate descriptions in human readable text should be provided for images. It would be really helpful, if these could be defined in the source tex file, and then automatically added when creating the object in the structure tree. I.e. it would be nice to have something like:
i'll pass the label to the tag as alt text \externalfigure[t:/sources/cow.pdf][label=whatever] (a relative simple extension as we already have label as well as alt in images) ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On 6/28/2015 12:59 PM, Dr. Dominik Klein wrote:
/Artifact BMC .. EMC
or in an advanced way with /Artifact PropertyList where the type of Artifact can be defined. It would be nice if those elements that are not included in the tag tree would be marked as artifacts by default. The same holds for \startelement[ignore] when one wants to explicitly remove something from the structure tree.
i'll add the simple variant (i see no need to add properties to something that is supposed to be ignored anyway) ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On 6/28/2015 12:59 PM, Dr. Dominik Klein wrote:
3.) Tag names of the resulting tag structure: Section 14.8.4 of [1] defines standard structure types, such as <H>, <P>, <Sect> etc. Context creates a tag-tree that uses names directly representing the structure names of the context laguage, such as <sectiontitle>. This should however be mapped to something standard, such as <H>. Interestingly these mappings seem to have been considered in strc-tag.mkiv but I was unable to generate such a tagged pdf. Editing/Outcommenting things in strc-tag.mkiv didn't work for me. It would be nice if there was a switch somewhere, i.e. \setuptagging[state=start,tagnames=pdf17] - or maybe I overlooked something?
The set of those standard tags is rather limited and imo one of the craziest things in pdf as we then end up with abuse of those html tags (and probably endless discussions on what to map onto what). I don't even have a clue what it would add to the concept either. Reflow is a braindead thing anyway.
4.) Acrobat Pro always complains that the language for the whole document is not set.
I don't have the latest version of pro (a bit expensive for the few times that i need it - when we have to produce pdf it always has to be rather old fashioned as printing houses want pdf from the previous century).
5.) Tables The generated structure looks something like this: <table> <tablerow> <tablecell> ... <tablerow> <tablecell> ...
Here, not only are the tag names non-compliant, also the tag structure should distinguish between the table header (THead), and table rows (TBody), c.f. Section 14.8.4.3.1 of [1]. A simple heuristic would be to always put the first line into THead tags, and the rest of the able into TBody.
Hm. It's just structure so I'm not sure what compliant means. If someone wants an html representation then it's better to use the export and apply some transformation on the generic structures (one that matches expections, that can differ). When we start tagging tables in details in pdf we probably also need to add all kind of extra attributes and then we need to do that for more than tables. It's not so much impossible (as most info is present) but more an extremely boring thing to do and no (free) application uses that info anyway.
6.) It would be nice if a flat tag structure could be created optionally. This is not a required feature according to [1], and in fact a properly nested structure is surely preferable for the final output; for debugging or checking during document creation however, a flat structure tree sometimes is easier to browse through.
I'm not sure what is meant with flat.
All in all, these seem to be the only issues that prevent accessible PDF documents with context. For those within an organization where accessibility is required legally for all publications, compliance to at least Acrobat Pro's checks is a huge issue. I do not know how difficult these things are to implement in Context (personally I am just lost in the code), but looking at e.g. tex.stackexchange for question related to accessibility, this is indeed a major obstacle for several people.
In fact adding pdf tagging to context was rather easy. Some time was spend on getting it done efficiently but it's a rather non-intrusive bit of code. When I'd done it I only cleaned it up a bit when the export option was added (as some code is shared) but I have to admit that I never use it. Luigi and I did look into properties a whiel ago and that was added then. So, it's not that difficult to add features, more a matter of priorities and motivation (apart from the fact that my acrobat is a bit old by now so I cannot really test). Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
participants (5)
-
Alan BRASLAU
-
Dominik Klein
-
Dr. Dominik Klein
-
Hans Hagen
-
luigi scarso