Re: [NTG-context] distributed / parallel TeX?

16 Dec 2008

      On 12/16/2008 2:08 AM, Taco Hoekwater wrote:
...
Hi Lars,
Lars Huttar wrote:
...
Hello,
We've been using TeX to typeset a 1200-page book, and at that size, the
time it takes to run becomes a big issue (especially with multiple
passes... about 8 on average). It takes us anywhere from 80 minutes on
our fastest machine, to 9 hours on our slowest laptop.
You should not need an average of 8 runs unless your document is
ridiculously complex and I am curious what you are doing (but that
is a different issue from what you are asking).
...
So the question comes up, can TeX runs take advantage of parallelized or
distributed processing?
No. For the most part, this is because of another requisite: for
applications to make good use of threads, they have to deal with a
problem that can be parallelized well. And generally speaking,
typesetting  does not fall in this category. A seemingly small change
on page 4 can easily affect each and every page right to the end
of the document.
Thank you for your response.

Certainly this is true in general and in the worst case, as things stand
currently. But I don't think it has to be that way. The following could
greatly mitigate that problem:

- You could design your document *specifically* to make the parts
independent, so that the true and authoritative way to typeset them is
to typeset the parts independently. (You can do this part now without
modifying TeX at all... you just have the various sections' .tex files
input common "headers" / macro defs.) Then, by definition, a change in
one section cannot affect another section (except for page numbers, and
possibly left/right pages, q.v. below).

- Most large works are divisible into chunks separated by page breaks
and possibly page breaks that force a "recto". This greatly limits the
effects that any section can have on another. The division ("chunking")
of the whole document into fairly-separate parts could either be done
manually, or if there are clear page breaks, automatically.

- The remaining problem, as you noted, is how to fix page references
from one section to another. Currently, TeX resolves forward references
by doing a second (or third, ...) pass, which uses page information from
the previous pass. The same technique could be used for resolving
inter-chunk references and determining on what page each chunk should
start. After one pass on of the independent chunks (ideally performed
simultaneously by separate processing nodes), page information is sent
from each node to a "coordinator" process. E.g. the node that processed
section two tells the coordinator that chapter 11 starts 37 pages after
the beginning of section two. The coordinator knows in what sequence the
chunks are to be concatenated, thanks to a config file. It uses this
information together with info from each of the nodes to build a table
of what page each chunk should start on, and a table giving the absolute
page number of each page reference. If pagination has changed, or is
new, this info is sent back to the various nodes for another round of
processing.

If this distributed method of typesetting a document takes 1 additional
iteration compared to doing it in series, but you get to split the
document into say 5 roughly equal parts, you could presumably get the
job done a lot quicker in spite of the extra iteration.

This is a crude description but hopefully the idea is clear enough.
...
...
parallel pieces so that you could guarantee that you would get the same
result for section B whether or not you were typesetting the whole book
at the same time?
if you are willing to promiss yourself that all chapters will be exactly
20 pages - no more, no less - they you can split the work off into
separate job files yourself and take advantage of a whole server
farm. If you can't ...
Yes, the splitting can be done manually now, and when the pain point
gets high enough, we do some manual separate TeX runs.

However, I'm thinking that for large works, there is enough gain to be
had that it would be worth systematizing the splitting process and
especially the recombining process, since the later is more error-prone.

I think people would do it a lot more if there were automation support
for it. I know we would.

But then, maybe our situation of having a large book with dual columns
and multipage tables is not common enough in the TeX world.
Maybe others who are typesetting similar books just use commercial
WYSIWYG typesetting tools, as we did in the previous edition of this book.

Lars