[NTG-context] Re: testing tagging for UA-2

12 Jul 2025

      Hi Max,
...
On Thu, 2025-07-10 at 10:54 +0200, Hans Hagen via ntg-context wrote:
...
On 7/10/2025 9:08 AM, Max Chernoff via ntg-context wrote:
sorry ... long answer ... probably much already said once
...
...
...
- I had to manually tag every paragraph with
    \startparagraph/\stopparagraph, which was annoying.
well, that also depends on the concept of paragraph i guess .. one can't
make structure from non-structure
Right, but any text outside a tagged structure automatically fails
validation, so incorrectly tagged paragraphs is better than nothing. And
"indentnext=auto" usually does a decent job visually, so you can
probably use the same heuristics for tagging.
The problem is that when we implement(ed) things like that, we have to 
make assumptions. I'm not saying that there are alternatives (could be 
optional) but what belongs to a 'paragraph' is a bit arbitrary, even in 
your example you have two paragraphs and a formula as being one 
paragraph. Heuristics always can fail. And keep in mind that in tex 
anything can end up inside anywhere: no endpoints are defined unless one 
starts blocking nested tagging!

I probably need to go over the now kind of ancient code and might see 
some ways to improve things (also performance wise) by using some more 
luametatex features.

Also, we never had any real usage and examples. I'm not going to tag and 
check out of hobby. The basics were implemented long ago (see tugboat 
article) and the logic is not complex, mostly annoying at the pdf level. 
I even dare to say it was kind of trivial because we already had 
structure to hook into (call it luck). Adding a few things here and 
there is not complex either, just boring.

Furthermore:

- only acrobat did show something but the last version I had (and paid 
for) was Pro 8 which at some point I lost after updating a laptop; the 
fact that it was never supported well elsewhere tells something

- the standard is a mess and tagging is pretty bad imo, maybe derived 
from how this (non structure related) tagging was present / needed in 
adobe applications (valid in itself as that where pdf came from) that 
used it for storing / editing purposes (there's more to tell about that 
but it's ancient history by now)

- things and interpretations and standards still change .. so much for a 
standard ... so, if at some point we validate and another we don't (or 
refuse to use tricks that suite specific programs) .. well, if someone 
barks that context does not do it right, they are rewriting history and 
as side effect removing themselves from my 'maybe worth listening to' 
list (which actually is true too when i read rediculous comments and 
assumptions wrt context in more public places)

[btw, i'm aware that you know more about context inner workings and 
intentions than the average tex user so thanks for occasionally 
clarifying that on eg SE]

- so when we have tim and motivation we pick up on it and adapt ... just 
that, adapt ... but within the reasonable (so i adapted the nested mcid 
and made all NonStruct into Span of Div .. we'll see); it will take 
years before that dust is settled. And no, i'm unlikely to add all kind 
of pseudo html directives, css or other crap to the file ... then one 
should just make html instead.
...
...
...
- "\setupbackend[format=pdf/ua-2]" needs to come before
    "\setuptagging[state=start]", otherwise lots of stuff will silently
    break.
Indeed, the format influences some later settings (arbitrary order
initializations would complicate the code with no real gain)
Right, this is more of a note in case anyone else ever runs into this
issue. but adding a warning might be a good idea (although maybe not
worth the effort).
I think it's mentioned somewhere but I can check. Maybe issue a warning 
when the order is flipped. It has to do with enabliong / disabling 
features and doing that in an arbitrary order is kind of messy. (I bet 
you know why by looking at code.)
...
...
and it comes for free (so
it's not driven by some paid project that can set priorities)
There is the TUG development and accessibility funds
https://tug.org/tc/devfund/grants.html
https://tug.org/twg/accessibility/
but the grants tend to be quite small, especially considering how
complicated/annoying the accessibility work is.
We (i) never work with grants from user groups, also because I'm 
involved in some and I want to avoid conflicts of interest. We do 
occasionally tried to get funding for other tex projects (like the font 
projects, mplib by taco, swiglib by luigi, some work by idris).

It is kind of interesting that the large scale users are totally absent: 
large publishers (i think they lost interest long ago, there used to be 
tex people there; i can't access their content anyway so why should i 
care), their providers (i know only a small dutch one that i have 
contact with and is involved in some font stuff - tricky scripts - in 
context; the rest are just comsumers that make money from tex and expect 
it to be around and developped; they normally sit on their technology 
anyway), service providers (where we were told corporate / investor 
policy drives decisions drive the processes; again, just assuming tex 
keeps developing).

It says something that in the decades we develop luatex none of those 
critically dependent-on-tex entities ever contacted developers (just to 
be sure of continuity, know those involved, as they have quite a 
dependency) .. well, they don't care so they can have it .. for me it's 
only users that matter and they normaly don't have the funds so 
basically we pocket all ourselves. Get me right, I'm talking substantial 
here, not some user groups membership or a few K. Companies that pay 5-7 
digit number salaries for developers don't really care about the tex 
part that much apart from using it. It's the small scale long term 
context users who keep that afloat, by being enthousiastic, challenging 
and friends. That's what we keep do(ing) it for, not that large scale 
realizes that dependency.

So, unless I run into specific large scale professional context users 
publisher / corporate usage of tex is non-existent for me (and a waste 
of time, point of no return passed). And I don't expect user groups to 
fund anything, just focus on staying around to provide the basic 
support, archiving, distributions, care for domestic language and script 
demands and maybe journals. Small tech so to say.

(Years ago we sometimes had bit and pieces of context dev being paid 
work as it was needed: specific kind of tables. And these are not 
IT-conforming project anyway, money wise. I only met very few 
professional large scale users (different mindset wrt software dev too) 
but they knew their way around and are not your average publishing kind 
of people. They also put their jobs at risk by going for tex solutions. 
But long term continuity is bad as organizations move and merge. We 
noticed the same with some educational publishers: merge, ditch, profit 
driven (instead of full-range content covering approach). Very few 
exceptions alas.

FWIW: quite often in our projects tex was a last resort ... all else had 
failed ... big money spent ... so then they were willing to accept a 
cheap tex solution, even if past experiences were bad (some knew tex 
from their education and somehow never saw it as valid solution), but 
again we had to develop the technology beforehand. We only say we can do 
something if we know we can (which is why we remained small). Kind of 
rewarding too: implementing solutions with a maverick, often interesting 
people. But it's always upfront free development then applied to hourly 
work (i bet this ein the same tex business can confirm that more hours 
go in there than get paid for, as there's always tricky detailed demands 
involved; oen seldom gets the simple many-pages stupid rendering: that 
goes to those in hiding).

So funding ... as i mentioned, tagging itself is kind of trivial, but 
adapting and making decisions and testing takes time and for that we 
need projects in order to prioritize it, unless it's a fun project (like 
Mikeal S - esp upcoming - lecture notes which are also artistic and 
educational master pieces so worth spending time on). Users probaly 
don't realize how much is done just because I interact with users that 
have challenges or think the way that fits into our way of thinking. 
Like: we're currently doing some mp related coding and it will never pay 
back, but it's a nice challenge, nice discussions and can be artistic 
fun too. Of course it can drive high perforemance workflows but tex and 
friends never fit into the solution space there.

One can argue the same for many mechanisms: improved math (only of 
interest to context users who notice the difference, publishers etc 
don't care .. what's good and hackery for decades is good enough 
forever; for journals it gets fixed in post production anyway so money 
can be made), better par building (idem, probably only some context 
users appreciate that), whatever we improve on the engine (also 
programming capabilities so that source code looks better).. a lot is 
about 'feel good' (okay for me).
...
...
...
- I've heard that it's actually usually better to put the TeX source in
    the Alt text for math instead of the current generated prose, because
    most people reading math are familiar with TeX anyways.
Well, i never meet people who are familiar with latex input ... or
expect that from us ... do you expect me to generate something latex
math from less verbose context math? And what about all kind of
(educational) stuff inside there. We try to accomodate what users expect
and challenge us to because that's the world we deal with. The latex is
just a different world (to us); little or no overlap.
LaTeX and ConTeXt inline math (_not_ display math) syntaxes are
essentially identical, since they're both essentially "Plain TeX with
\frac instead of \over", so I don't think that many LaTeX users would
struggle with ConTeXt's syntax.
Well, inline should be trivial as we have unicode math symbols. It's 
unfortunate that we cannot tag math sequences (as with bidi) and that 
some alphabets have gaps. I think the tex community failed big here and 
still does. But I'm not in any loop .. remember: context is not supposed 
to do / be used for math, that's the persistent narrative.

Also, I don't think the majority of our users cares about latex or 
whatever. They just ran into context, didn't like it and left, or saw
the use and fun of it and stayed. None is forced to use tex (or any 
syntax). For me latex is to context what msword is to latex: often sort 
of an annoyance. Different worlds and mindsets too.

(And we also have to be immune for some bashing: hard to install, 
always, evolving, slow, weird version numbers, needs scripts (why lua), 
not that many users compared to latex, no math (needed because no 
articles), no manuals, huge showstopping bugs (in engine) while users 
happily use it anyways, assumptions of how we think and why we decide as 
we do, etc. The usual social media crap. But still: lua(meta)tex comes 
from the context end, right? User can use and run it, indeed? We write 
about it and make all (!) public, yes? No one is forced to use it, so?

Concerning your comment: latex users (I suppose that we are talking 
about those who publish articles) never run into context documents and 
if they have the need for a better readable version context can generate 
one, or when they lack eyesight sources can be provides or we can 
discuss how to accomocate that ... the stupid tagging in pdf is pretty 
suboptimal (and likely also commercially and political driven). We're 
not in that world, bad for ones health and mindset. Also, kind of weird 
to impose something (this EU law thing) that is not stable and will in 
the end lead to many invalid (intermediate) stuff (okay, money to be 
made by fixing). Fighting windmills.
...
...
And the embeded xml
blob is probably more reliable than any context -> latex math
conversion. When it comes to math I think most context users are in
education so that's what we focus on.
Yes, I also agree that focusing on MathML is probably the best way
forwards.
Although even there it's a mess: came to and went from and came back in 
browsers (mathjax filled that gap well i think, ascimath is a bit of a 
pain). Now that some was dropped from mathml it was hard to implement 
(talls more about the programs i guess). One can wonder why that took so 
long anyway. I saw some drafts that puzzle me. So far we could always 
adapt but it doesn't get prettier over time. I like content mathml, 
predictable, it looke like open math would follow up but that was a 
failure, presentation was always kind of dumb and stays that way 
although they smuggle in some from content. But we'll adapt and have to 
accept the 'older docs and versiosn are not doing ok' rants.

If 30 years of history is representative imagine the next 10 years. Ever 
been to a typesetting museum that spans the last 50 years? The best oen 
can do is just keep producing nice looking documents and hope for the 
best. So rendering is what we focus on, that is the (small but 
appreciating) audience we have.

Just watch the youtube videos about the voyager and how they fix things 
on a distance, or the appollo lander computers .. much we brag about 
today was (conceptually) invented then. It's the time I picked up on 
computing. And tex is that old, and still kicking, so let's keep it alive.
...
...
You need to keep in mind that when we started with all this there were
no programs that did anything useful with tagging,
Viewer support is still very weak---MathML only works with Foxit and a
version of NVDA released less than a month ago.
I never used those viewers and settled for sumatra on windows, and 
okular on windows and linux. All platforms? I suppose there is the usual 
interaction between standard (adapt it), pdf processing programs (can or 
can't or don't want to do it) and pdf generation (no comments there).
...
...
that the spec was
(and is) not stable, that validation is a moving target etc ... it's all
about adaptation and it's always easy to point out whatever without
looking at the past and reality one has / had to deal with.
Yup, the very recent PDF 2.0 specification defines the <H> tags, and
then the UA-2 spec arbitrarily decides that those are now invalid.
Exactly. And the sectioning problem is not solved. It's the usual: one 
starts from high school writings, so a few sections, some itemize, a 
simple toc, maybe a figure, then a simple table. The same for 
typesetting: these are easy and then comes the rest. Combine that with 
todays (social media) advertising of "we're the best and will do better" 
and the usual "free, professional, enterprise support" options (and hope 
for the best when we sell ourselves or merge) and you will understand 
why i seldom (or never) check those out.

My motto is "the problem doesn't change" and maybe there are plenty ways 
to reach some goal, no need to compete and get the most users. We don't 
want unhappy enforced users (or at least not be confronted by that 
fact). There are plenty potential users otu there as long as they are 
free to choose and not harassed by tex or competing evanglists 
(comparing and bragging of being better is often a sign of loss an 
desperation anyway; fine when a user says it, but a developer ...) One 
should just use what one likes best.
...
...
A few decades from not all this tagging will probably be seen as kind of
rediculous anyway.
Yeah, I'm also fairly skeptical of all this PDF tagging stuff, mainly
because it seems much more compliance-driven than accessibility-driven.
But it's also a classic chicken-and-egg problem between viewer support
and document support, so if/when viewer support gets better, it should
be much more useful.
Sure, but releasing a spec before actually implementing and playing with 
it ... i tend to follow the 'third attempt is the best' approach so try 
not to be driven by release fever. By going ISO some long term stability 
was signalled. The amount of patching and explaining to me is alarming.

Of course there is new stuff like the balancing mvl that some play with 
but it's likely offocial around the meeting, more than year after we 
were concentrating on it: first we make some documents that stress all 
of it, thenas usual it can become stable and maybe occasionally 
improved) .. we're not in some competition.

(Even the new par building and math is not used to full extent by users; 
much isn't even advocated but only mentioned in low level manuals; 
expect no sales pitches.)

btw, About these nested mcids .. they come from the fact that we wanted 
to support math labels in metapost output ... we now just disable that 
because after all, that math is meaningless without drawing. The easy 
solitions: if it doesnt work just disable it or claim that it was 
unintended usage. But I admit that one should say that beforehand, not 
when a user runs out of luck. Normally we try to solve it, avoid loose 
ends. Try to predict extreme usage patterns (comes with age I guess).

Hans

ps. Sorry, too long and too many typos (working on large screen on 3 m 
distance).

ps. About mathml embedding, we already had something like that. We also 
did that with tables in the good old pdftex times: embed tables as excel 
xml .. one could just click on it and it worked fine; it's probably 
still there (on my machine) .. the good old times of hundred of 
thousands hyperlinks and so ... 2500+ page documents ... it's easier 
today and also faster but it's not like tex was ever behind (adobe nl 
reperesentatives used some context docs to demonstrate the possibilities 
of pdf they couldn't render themselves)

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------