Parallelizing typesetting of large documents with lots of cross-references

3 Dec 2020

      Hello,

This email is largely a simple notification of one "Fool's" dream...

("Only Fools rush in where Angels fear to tread").

I am currently attempting to create "a" (crude) "tool" with which I can
typeset:

- very large (1,000+ pages),
- highly cross-referenced documents,
- with embedded literate-programmed code (which needs
  concurrent compiling and execution),
- containing multiple MetaFun graphics,

all based upon ConTeXt-LMTX.

"In theory", it should be possible to typeset individual "sub-documents"
(any section which is known to start on a page boundary rather than
inside a page), and then re-combine the individual PDFs back into one
single PDF for the whole document (complete with control over the page
numbering).

The inherent problem is that the *whole* of a ConTeXt document depends
upon cross-references from *everywhere* else in the document. TeX and
ConTeXt "solve" this problem by using a multi-pass approach (in, for
example, 5 passes for the `luametatex` document).

Between each pass, ConTeXt saves this multi-pass data (page
numbers and cross-references) in the `*.tuc` file.

Clearly any parallelization approach needs to have a process which
coordinates the update and re-distribution of any changes in this
multi-pass data obtained by typesetting each "sub-document".

My current approach is to have a federation of Docker/Podman "pods".
Each "pod" would have a number of ConTeXt workers, as well as
(somewhere in the federation) a Lua based Multi-Pass-Data-coordinator.

All work would be coordinated by messages sent and received over a
corresponding federation of [NATS servers](https://nats.io/). (Neither
[Podman](https://podman.io/) pods nor NATS message coordination are
problems at the moment).

--------------------------------------------------------------------
**The real problem**, for typesetting a ConTeXt document, is the design
of the critical process which will act as a
"Multi-Pass-Data-coordinator".
--------------------------------------------------------------------

All ConTeXt sub-documents would be typeset in "once" mode using the
latest complete set of "Multi-Pass-Data" obtained from the central
coordinator. Then, once each typesetting run is complete, the resulting
"Multi-Pass-Data" would be sent back to the coordinator to be used to
update the coordinator's complete set of "Multi-Pass-Data" ready for
any required next typesetting pass.

(From the `context --help`:
...
mtx-context | --once only run once (no multipass data file is produced)
I will clearly have to patch(?) the mtx-context.lua script to allow
multipass data to be produced... this is probably not a problem).
(There would also be a number of additional processes/containers for
dependency analysis, build sequencing, compilation of code,
execution or interpretation of the code, stitching the PDFs back into
one PDF, etc -- these processes are also not the really critical
problem at the moment).

--------------------------------------------------------------------
QUESTIONS:

1. Are there any other known attempts to parallelize context?

2. Are there any other obvious problems with my approach?

3. Is there any existing documentation on the contents of the `*.tuc`
   file?

4. If there is no such documentation, is there any naming pattern of
   the Lua functions which get/set this multi-pass information that I
   should be aware of?
--------------------------------------------------------------------

Many thanks for all of the very useful comments so far...

Regards,

Stephen Gaito

Stephen Gaito

Taco Hoekwater

Hans Hagen

tags

participants (3)