On Thu, 16 Apr 2020 at 16:38, Mojca Miklavec wrote:
On Thu, 16 Apr 2020 at 11:29, Taco Hoekwater wrote:
On 16 Apr 2020, at 11:12, Mojca Miklavec wrote:
I have been asked to create a few thousand PDF documents from a CSV "database" today
In CPU cycles, the fastest way is to do a single context —once run generating all the pages as a single document, then using mutool merge to split it into separate documents using a (shell) loop.
Just to make it clear: I don't really need to optimize on the CPU end,
... says the optimist ... :) :) :)
as the bottleneck is on the other side of the keyboard, so as long as the CPU can process 5k pages today, I'm fine with it :) :) :)
While the bottleneck was in fact at the other side of the keyboard (preparation was certainly longer than the execution), it still took cca 2,5 hours to generate the full batch. (I'm pretty sure I could have further optimised the code, even though 1 second per run is still pretty fast [when I started using context it was more like 30 seconds per run], it just adds up when talking about thousands of pages. This greatly reminds me on the awesome speedup that Hans achieved when rewriting the mplib code & the initial \sometxt changes inside metapost which also lead to 100-fold speedups as one no longer needed to start TeX a zillion times.) While waiting I wanted to start being clever and do the processing in the same folder in parallel (I have lots of cores after all), and ended up calling a script with context --N={n} --output=doc-{nnnn}.pdf template.tex context --purge only to notice much later that running multiple context runs in the same folder (some of them compiling and some of them deleting the temporary files) might not have been the best idea on the planet, many documents ended up missing, and many corrupted. So I had to rerun half of the documents. One of the interesting statistics. I used a bunch of images (the same png images in all documents; cca. 290k in total). The generated documents were 1,5 GB in size. When compressed with tar.gz, there was almost no noticeable difference between the compressed and non-compressed data size (1,4 GB vs. 1,5 GB). But when compressing with tar.xz, it compressed 1,5 GB worth of document into merely 27 MB (a single document is 360 k). The documents have been e-mailed out, but now they need to print hard copies for archive. I'm happy I don't need to be the one printing and storing that :) :) :) Mojca