[NTG-pdftex] processing speed

Philip TAYLOR (Ret'd) P.Taylor at Rhul.Ac.Uk
Sat May 30 00:13:40 CEST 2009

Reinhard Kotucha wrote:

 > Does the string pool contain the hash for control sequences?  This
 > would explain the behavior.  From texmf.cnf:
 > % Max number of characters in all strings, including all error messages,
 > % help texts, font names, control sequences.  These values apply to TeX and MP.
 > pool_size = 1250000
 > Maybe PGF creates a lot of control sequences at runtime, using \csname
 > and \endcsname in macros.  This would increase the control sequence
 > hash and then it takes more time to find a particular macro.
 > But if they are created dynamically at runtime, they are created
 > within a group (\begin{tikzpicture}...\end{tikzpicture}) and I expect
 > that everything created within a paricular group is removed from the
 > hash after \endgroup.

I no longer remember if that is the case, but DEK's
words that I cited earlier were extracted from a much
longer message which I now repeat /verbatim/ below;
it certainly makes references to the impact of control
sequences on the string pool.

** Phil.
 > File name overflow of string pool
 > [ Since this report, I have seen a couple of other reports on this
 >   topic in the electronic discussion lists, mostly from Europe.
 >   While not a bug, it can certainly be a serious inconvenience.
 >   A couple of the reports have mentioned building nonstandard
 >   versions of TeX with a separate pool of file names; not good for
 >   compatibility. ]
 > Date: Fri, 12 Jul 91 19:06 +0200
 > From: "Johannes L. Braams" <J.L.Braams%pttrnl.nl at pucc.PRINCETON.EDU>
 > Subject: Bug/misfeature in TeX?
 >         We have run into a problem with TeX.
 >         We have an application where we would like to \input about
 >         2400 files. We can't do that because TeX runs out of string pool
 >         space. This application is rather important because it concerns
 >         the reports the lab has to make each quarter of a year.
 >         When I studied TeX the program to find out what happens when a file
 >         is being \input I found that the name of the file is stored in
 >         string pool. AND it never gets removed from the string pool (as far
 >         as I could find out).
 >         What I don't understand is why filenames are written to string pool
 >         in the first place.
 >         Isn't it possible to use some kind of stack or array mechanism to
 >         store filenames? It should then be possible to free the memory
 >         used to store a filename when the file gets closed and the filename
 >         is no longer needed.
 >         Do you know the answer or someone who does? Or is this a bug? I would
 >         rather call it a design flaw actually.
 >     Regards,
 >         Johannes Braams
 > PTT Research Neher Laboratorium,        P.O. box 421,
 > 2260 AK Leidschendam,                   The Netherlands.
 > Phone    : +31 70 3325051               E-mail : JL_Braams at pttrnl.nl
 > Fax      : +31 70 3326477
 >  -------
 > Date: Mon, 15 Jul 91 01:59:22 BST
 > From: Chris Thompson <CET1 at phoenix.cambridge.ac.uk>
 > Subject: Re: Bug/misfeature in TeX?
 > I agree that it's a design flaw, not a bug. People do keep falling
 > over it from time to time, though, so maybe Don could be asked to
 > think about it again. I suspect, however, that there is no easy fix,
 > for reasons I will explain below.
 > Johannes asks why the names go in the string pool in the first place:
 > the answer to that is "why not?"... it is the convenient place to keep
 > more or less arbitrarily long strings. The space occupied by things
 > added to the string pool can be reclaimed, provided it is done straight
 > away, before other parts of TeX have been exercised that may add other
 > strings (especially, control sequence names) to the pool. There are
 > two types of file name to think about (neither of which are reclaimed
 > at the moment, with one partial---and wrong---exception):
 > 1. The 1, 2 or 3 strings generated by |scan_file_name|. Usually these
 >    are used in some implementation-dependant way to open a file, and
 >    maybe then as arguments to |*_make_name_string|, and are then never
 >    needed again; and all this would usually happen straight away.
 >    Exception: deferred (non-\immediate) \openout's.
 > 2. The string generated by |*_make_name_string|. For things like the
 >    log and DVI files, this has to be kept for ever (printing them is
 >    almost the last thing TeX does). The interesting case, however, is
 >    \input. The string is printed (immediately), and then stored in the
 >    |name_field| of the current input stack entry. *Almost* the only
 >    thing TeX uses it for thereafter is as a number > 17 (to distinguish
 >    the case of an input level being an \input file (as opposed to
 >    terminal input or a \read level). The sole exception is in section
 >    84 where it is used to deal with the "E" response to the error
 >    prompt: in distribution TeX as part of a message, but in practice
 >    as input to the implementation-dependant way of invoking an editor.
 > The ``partial and wrong exception'' is the code in section 537
 > introduced by change 283. |start_input| reclaims the space occupied
 > by the result of |a_make_name_string|, if that is still the top string
 > in the pool, and replaces it by the `name' part of the results of
 > |scan_file_name|. I have had to undo this "fix" in my implementations:
 > the *only* thing that the ``file name'' is needed for is as an argument
 > to the editor, and it is an unwarranted assumption that
 > a. The values of the `area' and `extension' parts of the name are
 >    irrelevant to that purpose, and
 > b. The output of |a_make_name_string| doesn't contain extra information,
 >    available as a result of the opening process, that may also be
 >    relevant.
 > In theory the contents of the strings of type 2 for \input files could
 > be kept on some sort of separate stack, as Johannes suggests (parallel
 > to the |input_file| and |line_stack| arrays), but this would be quite
 > convoluted and involve a lot of duplication of code. More plausible
 > would be an attempt to reclaim them if they are still the top string
 > in the pool when the file is closed (in |end_file_reading|); this isn't
 > so unlikely in cases like Johannes'... presumably not all 2400 files
 > can use never-before-encountered control sequences, or he will be
 > running out of other things besides the string pool!
 > The strings of type 1 create a difficulty, however, unless they can
 > be got rid of just after the call of |a_make_name_string| (a certain
 > amount of permuting of the string pool would be required to do that).
 > If they, also, are to be got rid of when the file is closed, again
 > subject to the condition that they are at the top of the pool, one
 > will have to (at least) remember how many of them there were.
 > Some of this would, in fact, be rather easier in METAFONT than TeX.
 > METAFONT's string pool entries have a use count, and reclaiming space
 > consists of purging consecutive entries at the top of the pool whose
 > use counts have all fallen to zero. One could easily arrange that the
 > strings of type 1 had use counts of zero after the opening process was
 > over, and that the strings of type 2 for "input" files had a use count
 > of 1 which was decremented to 0 at close time; then the right things
 > would happen more or less automatically. However, TeX *doesn't* have
 > such use counts, and I don't really suppose Don is going to introduce
 > them in order to solve this problem.
 > Chris Thompson
 >  -------
 >   [ dek:
 >         I think the strings are also needed for font file names.
 >         For ordinary input files I put the special code into \S537
 >         [which CET1 disabled] so that the Math Reviews could input
 >         lots of files.
 >         Of course there's a workaround (using the operating system
 >         to concatenate files!) but otherwise all I can suggest is a
 >         local change-file routine that tries to reclaim string space
 >         when closing files if the unneeded strings are still at the
 >         end of the string pool.  You could introduce a new array
 >         indexed by 1..max_in_open to keep relevant status information
 >         if it isn't already present (see \S304).
 >   ]

More information about the ntg-pdftex mailing list