On 12/16/2008 2:08 AM, Taco Hoekwater wrote:
Hi Lars,
Lars Huttar wrote:
Hello,
We've been using TeX to typeset a 1200-page book, and at that size, the time it takes to run becomes a big issue (especially with multiple passes... about 8 on average). It takes us anywhere from 80 minutes on our fastest machine, to 9 hours on our slowest laptop.
You should not need an average of 8 runs unless your document is ridiculously complex and I am curious what you are doing (but that is a different issue from what you are asking).
So the question comes up, can TeX runs take advantage of parallelized or distributed processing?
No. For the most part, this is because of another requisite: for applications to make good use of threads, they have to deal with a problem that can be parallelized well. And generally speaking, typesetting does not fall in this category. A seemingly small change on page 4 can easily affect each and every page right to the end of the document.
Thank you for your response. Certainly this is true in general and in the worst case, as things stand currently. But I don't think it has to be that way. The following could greatly mitigate that problem: - You could design your document *specifically* to make the parts independent, so that the true and authoritative way to typeset them is to typeset the parts independently. (You can do this part now without modifying TeX at all... you just have the various sections' .tex files input common "headers" / macro defs.) Then, by definition, a change in one section cannot affect another section (except for page numbers, and possibly left/right pages, q.v. below). - Most large works are divisible into chunks separated by page breaks and possibly page breaks that force a "recto". This greatly limits the effects that any section can have on another. The division ("chunking") of the whole document into fairly-separate parts could either be done manually, or if there are clear page breaks, automatically. - The remaining problem, as you noted, is how to fix page references from one section to another. Currently, TeX resolves forward references by doing a second (or third, ...) pass, which uses page information from the previous pass. The same technique could be used for resolving inter-chunk references and determining on what page each chunk should start. After one pass on of the independent chunks (ideally performed simultaneously by separate processing nodes), page information is sent from each node to a "coordinator" process. E.g. the node that processed section two tells the coordinator that chapter 11 starts 37 pages after the beginning of section two. The coordinator knows in what sequence the chunks are to be concatenated, thanks to a config file. It uses this information together with info from each of the nodes to build a table of what page each chunk should start on, and a table giving the absolute page number of each page reference. If pagination has changed, or is new, this info is sent back to the various nodes for another round of processing. If this distributed method of typesetting a document takes 1 additional iteration compared to doing it in series, but you get to split the document into say 5 roughly equal parts, you could presumably get the job done a lot quicker in spite of the extra iteration. This is a crude description but hopefully the idea is clear enough.
parallel pieces so that you could guarantee that you would get the same result for section B whether or not you were typesetting the whole book at the same time?
if you are willing to promiss yourself that all chapters will be exactly 20 pages - no more, no less - they you can split the work off into separate job files yourself and take advantage of a whole server farm. If you can't ...
Yes, the splitting can be done manually now, and when the pain point gets high enough, we do some manual separate TeX runs. However, I'm thinking that for large works, there is enough gain to be had that it would be worth systematizing the splitting process and especially the recombining process, since the later is more error-prone. I think people would do it a lot more if there were automation support for it. I know we would. But then, maybe our situation of having a large book with dual columns and multipage tables is not common enough in the TeX world. Maybe others who are typesetting similar books just use commercial WYSIWYG typesetting tools, as we did in the previous edition of this book. Lars