
Hi Max,
On Thu, 2025-07-10 at 10:54 +0200, Hans Hagen via ntg-context wrote:
On 7/10/2025 9:08 AM, Max Chernoff via ntg-context wrote:
sorry ... long answer ... probably much already said once
- I had to manually tag every paragraph with \startparagraph/\stopparagraph, which was annoying.
well, that also depends on the concept of paragraph i guess .. one can't make structure from non-structure
Right, but any text outside a tagged structure automatically fails validation, so incorrectly tagged paragraphs is better than nothing. And "indentnext=auto" usually does a decent job visually, so you can probably use the same heuristics for tagging.
The problem is that when we implement(ed) things like that, we have to make assumptions. I'm not saying that there are alternatives (could be optional) but what belongs to a 'paragraph' is a bit arbitrary, even in your example you have two paragraphs and a formula as being one paragraph. Heuristics always can fail. And keep in mind that in tex anything can end up inside anywhere: no endpoints are defined unless one starts blocking nested tagging! I probably need to go over the now kind of ancient code and might see some ways to improve things (also performance wise) by using some more luametatex features. Also, we never had any real usage and examples. I'm not going to tag and check out of hobby. The basics were implemented long ago (see tugboat article) and the logic is not complex, mostly annoying at the pdf level. I even dare to say it was kind of trivial because we already had structure to hook into (call it luck). Adding a few things here and there is not complex either, just boring. Furthermore: - only acrobat did show something but the last version I had (and paid for) was Pro 8 which at some point I lost after updating a laptop; the fact that it was never supported well elsewhere tells something - the standard is a mess and tagging is pretty bad imo, maybe derived from how this (non structure related) tagging was present / needed in adobe applications (valid in itself as that where pdf came from) that used it for storing / editing purposes (there's more to tell about that but it's ancient history by now) - things and interpretations and standards still change .. so much for a standard ... so, if at some point we validate and another we don't (or refuse to use tricks that suite specific programs) .. well, if someone barks that context does not do it right, they are rewriting history and as side effect removing themselves from my 'maybe worth listening to' list (which actually is true too when i read rediculous comments and assumptions wrt context in more public places) [btw, i'm aware that you know more about context inner workings and intentions than the average tex user so thanks for occasionally clarifying that on eg SE] - so when we have tim and motivation we pick up on it and adapt ... just that, adapt ... but within the reasonable (so i adapted the nested mcid and made all NonStruct into Span of Div .. we'll see); it will take years before that dust is settled. And no, i'm unlikely to add all kind of pseudo html directives, css or other crap to the file ... then one should just make html instead.
- "\setupbackend[format=pdf/ua-2]" needs to come before "\setuptagging[state=start]", otherwise lots of stuff will silently break.
Indeed, the format influences some later settings (arbitrary order initializations would complicate the code with no real gain)
Right, this is more of a note in case anyone else ever runs into this issue. but adding a warning might be a good idea (although maybe not worth the effort).
I think it's mentioned somewhere but I can check. Maybe issue a warning when the order is flipped. It has to do with enabliong / disabling features and doing that in an arbitrary order is kind of messy. (I bet you know why by looking at code.)
and it comes for free (so it's not driven by some paid project that can set priorities)
There is the TUG development and accessibility funds
https://tug.org/tc/devfund/grants.html
https://tug.org/twg/accessibility/
but the grants tend to be quite small, especially considering how complicated/annoying the accessibility work is.
We (i) never work with grants from user groups, also because I'm involved in some and I want to avoid conflicts of interest. We do occasionally tried to get funding for other tex projects (like the font projects, mplib by taco, swiglib by luigi, some work by idris). It is kind of interesting that the large scale users are totally absent: large publishers (i think they lost interest long ago, there used to be tex people there; i can't access their content anyway so why should i care), their providers (i know only a small dutch one that i have contact with and is involved in some font stuff - tricky scripts - in context; the rest are just comsumers that make money from tex and expect it to be around and developped; they normally sit on their technology anyway), service providers (where we were told corporate / investor policy drives decisions drive the processes; again, just assuming tex keeps developing). It says something that in the decades we develop luatex none of those critically dependent-on-tex entities ever contacted developers (just to be sure of continuity, know those involved, as they have quite a dependency) .. well, they don't care so they can have it .. for me it's only users that matter and they normaly don't have the funds so basically we pocket all ourselves. Get me right, I'm talking substantial here, not some user groups membership or a few K. Companies that pay 5-7 digit number salaries for developers don't really care about the tex part that much apart from using it. It's the small scale long term context users who keep that afloat, by being enthousiastic, challenging and friends. That's what we keep do(ing) it for, not that large scale realizes that dependency. So, unless I run into specific large scale professional context users publisher / corporate usage of tex is non-existent for me (and a waste of time, point of no return passed). And I don't expect user groups to fund anything, just focus on staying around to provide the basic support, archiving, distributions, care for domestic language and script demands and maybe journals. Small tech so to say. (Years ago we sometimes had bit and pieces of context dev being paid work as it was needed: specific kind of tables. And these are not IT-conforming project anyway, money wise. I only met very few professional large scale users (different mindset wrt software dev too) but they knew their way around and are not your average publishing kind of people. They also put their jobs at risk by going for tex solutions. But long term continuity is bad as organizations move and merge. We noticed the same with some educational publishers: merge, ditch, profit driven (instead of full-range content covering approach). Very few exceptions alas. FWIW: quite often in our projects tex was a last resort ... all else had failed ... big money spent ... so then they were willing to accept a cheap tex solution, even if past experiences were bad (some knew tex from their education and somehow never saw it as valid solution), but again we had to develop the technology beforehand. We only say we can do something if we know we can (which is why we remained small). Kind of rewarding too: implementing solutions with a maverick, often interesting people. But it's always upfront free development then applied to hourly work (i bet this ein the same tex business can confirm that more hours go in there than get paid for, as there's always tricky detailed demands involved; oen seldom gets the simple many-pages stupid rendering: that goes to those in hiding). So funding ... as i mentioned, tagging itself is kind of trivial, but adapting and making decisions and testing takes time and for that we need projects in order to prioritize it, unless it's a fun project (like Mikeal S - esp upcoming - lecture notes which are also artistic and educational master pieces so worth spending time on). Users probaly don't realize how much is done just because I interact with users that have challenges or think the way that fits into our way of thinking. Like: we're currently doing some mp related coding and it will never pay back, but it's a nice challenge, nice discussions and can be artistic fun too. Of course it can drive high perforemance workflows but tex and friends never fit into the solution space there. One can argue the same for many mechanisms: improved math (only of interest to context users who notice the difference, publishers etc don't care .. what's good and hackery for decades is good enough forever; for journals it gets fixed in post production anyway so money can be made), better par building (idem, probably only some context users appreciate that), whatever we improve on the engine (also programming capabilities so that source code looks better).. a lot is about 'feel good' (okay for me).
- I've heard that it's actually usually better to put the TeX source in the Alt text for math instead of the current generated prose, because most people reading math are familiar with TeX anyways.
Well, i never meet people who are familiar with latex input ... or expect that from us ... do you expect me to generate something latex math from less verbose context math? And what about all kind of (educational) stuff inside there. We try to accomodate what users expect and challenge us to because that's the world we deal with. The latex is just a different world (to us); little or no overlap.
LaTeX and ConTeXt inline math (_not_ display math) syntaxes are essentially identical, since they're both essentially "Plain TeX with \frac instead of \over", so I don't think that many LaTeX users would struggle with ConTeXt's syntax.
Well, inline should be trivial as we have unicode math symbols. It's unfortunate that we cannot tag math sequences (as with bidi) and that some alphabets have gaps. I think the tex community failed big here and still does. But I'm not in any loop .. remember: context is not supposed to do / be used for math, that's the persistent narrative. Also, I don't think the majority of our users cares about latex or whatever. They just ran into context, didn't like it and left, or saw the use and fun of it and stayed. None is forced to use tex (or any syntax). For me latex is to context what msword is to latex: often sort of an annoyance. Different worlds and mindsets too. (And we also have to be immune for some bashing: hard to install, always, evolving, slow, weird version numbers, needs scripts (why lua), not that many users compared to latex, no math (needed because no articles), no manuals, huge showstopping bugs (in engine) while users happily use it anyways, assumptions of how we think and why we decide as we do, etc. The usual social media crap. But still: lua(meta)tex comes from the context end, right? User can use and run it, indeed? We write about it and make all (!) public, yes? No one is forced to use it, so? Concerning your comment: latex users (I suppose that we are talking about those who publish articles) never run into context documents and if they have the need for a better readable version context can generate one, or when they lack eyesight sources can be provides or we can discuss how to accomocate that ... the stupid tagging in pdf is pretty suboptimal (and likely also commercially and political driven). We're not in that world, bad for ones health and mindset. Also, kind of weird to impose something (this EU law thing) that is not stable and will in the end lead to many invalid (intermediate) stuff (okay, money to be made by fixing). Fighting windmills.
And the embeded xml blob is probably more reliable than any context -> latex math conversion. When it comes to math I think most context users are in education so that's what we focus on.
Yes, I also agree that focusing on MathML is probably the best way forwards.
Although even there it's a mess: came to and went from and came back in browsers (mathjax filled that gap well i think, ascimath is a bit of a pain). Now that some was dropped from mathml it was hard to implement (talls more about the programs i guess). One can wonder why that took so long anyway. I saw some drafts that puzzle me. So far we could always adapt but it doesn't get prettier over time. I like content mathml, predictable, it looke like open math would follow up but that was a failure, presentation was always kind of dumb and stays that way although they smuggle in some from content. But we'll adapt and have to accept the 'older docs and versiosn are not doing ok' rants. If 30 years of history is representative imagine the next 10 years. Ever been to a typesetting museum that spans the last 50 years? The best oen can do is just keep producing nice looking documents and hope for the best. So rendering is what we focus on, that is the (small but appreciating) audience we have. Just watch the youtube videos about the voyager and how they fix things on a distance, or the appollo lander computers .. much we brag about today was (conceptually) invented then. It's the time I picked up on computing. And tex is that old, and still kicking, so let's keep it alive.
You need to keep in mind that when we started with all this there were no programs that did anything useful with tagging,
Viewer support is still very weak---MathML only works with Foxit and a version of NVDA released less than a month ago.
I never used those viewers and settled for sumatra on windows, and okular on windows and linux. All platforms? I suppose there is the usual interaction between standard (adapt it), pdf processing programs (can or can't or don't want to do it) and pdf generation (no comments there).
that the spec was (and is) not stable, that validation is a moving target etc ... it's all about adaptation and it's always easy to point out whatever without looking at the past and reality one has / had to deal with.
Yup, the very recent PDF 2.0 specification defines the <H> tags, and then the UA-2 spec arbitrarily decides that those are now invalid.
Exactly. And the sectioning problem is not solved. It's the usual: one starts from high school writings, so a few sections, some itemize, a simple toc, maybe a figure, then a simple table. The same for typesetting: these are easy and then comes the rest. Combine that with todays (social media) advertising of "we're the best and will do better" and the usual "free, professional, enterprise support" options (and hope for the best when we sell ourselves or merge) and you will understand why i seldom (or never) check those out. My motto is "the problem doesn't change" and maybe there are plenty ways to reach some goal, no need to compete and get the most users. We don't want unhappy enforced users (or at least not be confronted by that fact). There are plenty potential users otu there as long as they are free to choose and not harassed by tex or competing evanglists (comparing and bragging of being better is often a sign of loss an desperation anyway; fine when a user says it, but a developer ...) One should just use what one likes best.
A few decades from not all this tagging will probably be seen as kind of rediculous anyway.
Yeah, I'm also fairly skeptical of all this PDF tagging stuff, mainly because it seems much more compliance-driven than accessibility-driven. But it's also a classic chicken-and-egg problem between viewer support and document support, so if/when viewer support gets better, it should be much more useful.
Sure, but releasing a spec before actually implementing and playing with it ... i tend to follow the 'third attempt is the best' approach so try not to be driven by release fever. By going ISO some long term stability was signalled. The amount of patching and explaining to me is alarming. Of course there is new stuff like the balancing mvl that some play with but it's likely offocial around the meeting, more than year after we were concentrating on it: first we make some documents that stress all of it, thenas usual it can become stable and maybe occasionally improved) .. we're not in some competition. (Even the new par building and math is not used to full extent by users; much isn't even advocated but only mentioned in low level manuals; expect no sales pitches.) btw, About these nested mcids .. they come from the fact that we wanted to support math labels in metapost output ... we now just disable that because after all, that math is meaningless without drawing. The easy solitions: if it doesnt work just disable it or claim that it was unintended usage. But I admit that one should say that beforehand, not when a user runs out of luck. Normally we try to solve it, avoid loose ends. Try to predict extreme usage patterns (comes with age I guess). Hans ps. Sorry, too long and too many typos (working on large screen on 3 m distance). ps. About mathml embedding, we already had something like that. We also did that with tables in the good old pdftex times: embed tables as excel xml .. one could just click on it and it worked fine; it's probably still there (on my machine) .. the good old times of hundred of thousands hyperlinks and so ... 2500+ page documents ... it's easier today and also faster but it's not like tex was ever behind (adobe nl reperesentatives used some context docs to demonstrate the possibilities of pdf they couldn't render themselves) ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------