[NTG-pdftex] processing speed

Reinhard Kotucha reinhard.kotucha at web.de
Sat May 30 00:01:19 CEST 2009

On 29 May 2009 Hans Hagen wrote:

 > The Thanh Han wrote:
 > > indeed it must be something with string recycling (in
 > > tex.ch) during "\input something"; however both tex & pdftex
 > > (in texlive) seem not to release the filename string, but
 > > pdftex takes longer to process later \input's. FWIW,
 > > pdftex doesn't create new strings in Reinhard's example (it
 > > runs in dvi mode).
 > i remember a discussion about this filename issue and that it is 
 > supposed to be reclaimed when it's the last thing added to the pool 
 > (some etex thing) but i cannot find anything on the web

I rember vaguely that there had been a discussion a couple of years
ago.  The string pool problem had been known very well at this time,
the old version of Keith Refdahl's epslatex.pdf suggested to specify
full filenames (with extensions) as arguments to \includegraphics.
Otherwise \includegraphics would use \openin in order to search for a
file with an appropriate extension and increases the string pool.
At this time the small size if the string pool was problematic, not
processing speed.

Olaf Weber once said that he planned to solve this problem but I never
heard anything about it again.  If this exactly the problem we are
talking about, it seems that he fixed it because Knuth's TeX doesn't
have this problem any more.  There had been a few changes in pdfTeX
afterwards, one is the integration of e-TeX, but there had also been
an upgrade of TeX itself (3.1415926).

Actually, my problem has nothing to do with file names.  My Perl
script produces one big LaTeX file (30 MB) and after the preamble,
\input isn't used at all.

Does the string pool contain the hash for control sequences?  This
would explain the behavior.  From texmf.cnf:

% Max number of characters in all strings, including all error messages,
% help texts, font names, control sequences.  These values apply to TeX and MP.
pool_size = 1250000

Maybe PGF creates a lot of control sequences at runtime, using \csname
and \endcsname in macros.  This would increase the control sequence
hash and then it takes more time to find a particular macro.

But if they are created dynamically at runtime, they are created
within a group (\begin{tikzpicture}...\end{tikzpicture}) and I expect
that everything created within a paricular group is removed from the
hash after \endgroup.

I have no idea what's happening.  BTW, please excuse me that I didn't
provide a better test file.  I assumed that the problem is caused by
large files, not by \input.  But the idea was to create a test file
which works with pdfTeX and DEK's TeX in order to find out whether
they behave differently.

With pgfplots the problem is more obvious: I started pdftex in the
late morning.  When I later looked into the log file I noticed that it
already created more than 700 pages.  Thus, I assumed that I get a
result after the lunch break.  But it finished in the evening, two
minutes before I had to shut down the computer, otherwise I had missed
the train to Hannover.

It would be fine if the problem could be solved, one way or the other.
But since I'm obviously the only one who encountered this problem and
the problem obviously exists for years, I propose to change nothing
before TeX Live 2009 is released.


Reinhard Kotucha			              Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover	                      mailto:reinhard.kotucha at web.de
Microsoft isn't the answer. Microsoft is the question, and the answer is NO.

More information about the ntg-pdftex mailing list