# [NTG-pdftex] processing speed

Philip TAYLOR (Ret'd) P.Taylor at Rhul.Ac.Uk
Sat May 30 00:13:40 CEST 2009


Reinhard Kotucha wrote:

> Does the string pool contain the hash for control sequences?  This
> would explain the behavior.  From texmf.cnf:
>
> % Max number of characters in all strings, including all error messages,
> % help texts, font names, control sequences.  These values apply to TeX and MP.
> pool_size = 1250000
>
> Maybe PGF creates a lot of control sequences at runtime, using \csname
> and \endcsname in macros.  This would increase the control sequence
> hash and then it takes more time to find a particular macro.
>
> But if they are created dynamically at runtime, they are created
> within a group (\begin{tikzpicture}...\end{tikzpicture}) and I expect
> that everything created within a paricular group is removed from the
> hash after \endgroup.

I no longer remember if that is the case, but DEK's
words that I cited earlier were extracted from a much
longer message which I now repeat /verbatim/ below;
it certainly makes references to the impact of control
sequences on the string pool.

** Phil.
--------
> File name overflow of string pool
>
> [ Since this report, I have seen a couple of other reports on this
>   topic in the electronic discussion lists, mostly from Europe.
>   While not a bug, it can certainly be a serious inconvenience.
>   A couple of the reports have mentioned building nonstandard
>   versions of TeX with a separate pool of file names; not good for
>   compatibility. ]
>
> Date: Fri, 12 Jul 91 19:06 +0200
> From: "Johannes L. Braams" <J.L.Braams%pttrnl.nl at pucc.PRINCETON.EDU>
> Subject: Bug/misfeature in TeX?
>
>         We have run into a problem with TeX.
>         We have an application where we would like to \input about
>         2400 files. We can't do that because TeX runs out of string pool
>         space. This application is rather important because it concerns
>         the reports the lab has to make each quarter of a year.
>
>         When I studied TeX the program to find out what happens when a file
>         is being \input I found that the name of the file is stored in
>         string pool. AND it never gets removed from the string pool (as far
>         as I could find out).
>         What I don't understand is why filenames are written to string pool
>         in the first place.
>         Isn't it possible to use some kind of stack or array mechanism to
>         store filenames? It should then be possible to free the memory
>         used to store a filename when the file gets closed and the filename
>         is no longer needed.
>
>         Do you know the answer or someone who does? Or is this a bug? I would
>         rather call it a design flaw actually.
>
>     Regards,
>
>         Johannes Braams
>
> PTT Research Neher Laboratorium,        P.O. box 421,
> 2260 AK Leidschendam,                   The Netherlands.
> Phone    : +31 70 3325051               E-mail : JL_Braams at pttrnl.nl
> Fax      : +31 70 3326477
>  -------
> Date: Mon, 15 Jul 91 01:59:22 BST
> From: Chris Thompson <CET1 at phoenix.cambridge.ac.uk>
> Subject: Re: Bug/misfeature in TeX?
>
> I agree that it's a design flaw, not a bug. People do keep falling
> over it from time to time, though, so maybe Don could be asked to
> think about it again. I suspect, however, that there is no easy fix,
> for reasons I will explain below.
>
> Johannes asks why the names go in the string pool in the first place:
> the answer to that is "why not?"... it is the convenient place to keep
> more or less arbitrarily long strings. The space occupied by things
> added to the string pool can be reclaimed, provided it is done straight
> away, before other parts of TeX have been exercised that may add other
> strings (especially, control sequence names) to the pool. There are
> two types of file name to think about (neither of which are reclaimed
> at the moment, with one partial---and wrong---exception):
>
> 1. The 1, 2 or 3 strings generated by |scan_file_name|. Usually these
>    are used in some implementation-dependant way to open a file, and
>    maybe then as arguments to |*_make_name_string|, and are then never
>    needed again; and all this would usually happen straight away.
>    Exception: deferred (non-\immediate) \openout's.
>
> 2. The string generated by |*_make_name_string|. For things like the
>    log and DVI files, this has to be kept for ever (printing them is
>    almost the last thing TeX does). The interesting case, however, is
>    \input. The string is printed (immediately), and then stored in the
>    |name_field| of the current input stack entry. *Almost* the only
>    thing TeX uses it for thereafter is as a number > 17 (to distinguish
>    the case of an input level being an \input file (as opposed to
>    terminal input or a \read level). The sole exception is in section
>    84 where it is used to deal with the "E" response to the error
>    prompt: in distribution TeX as part of a message, but in practice
>    as input to the implementation-dependant way of invoking an editor.
>
> (BEGIN ASIDE
>
> The partial and wrong exception'' is the code in section 537
> introduced by change 283. |start_input| reclaims the space occupied
> by the result of |a_make_name_string|, if that is still the top string
> in the pool, and replaces it by the name' part of the results of
> |scan_file_name|. I have had to undo this "fix" in my implementations:
> the *only* thing that the file name'' is needed for is as an argument
> to the editor, and it is an unwarranted assumption that
>
> a. The values of the area' and extension' parts of the name are
>    irrelevant to that purpose, and
>
> b. The output of |a_make_name_string| doesn't contain extra information,
>    available as a result of the opening process, that may also be
>    relevant.
>
> END ASIDE)
>
> In theory the contents of the strings of type 2 for \input files could
> be kept on some sort of separate stack, as Johannes suggests (parallel
> to the |input_file| and |line_stack| arrays), but this would be quite
> convoluted and involve a lot of duplication of code. More plausible
> would be an attempt to reclaim them if they are still the top string
> in the pool when the file is closed (in |end_file_reading|); this isn't
> so unlikely in cases like Johannes'... presumably not all 2400 files
> can use never-before-encountered control sequences, or he will be
> running out of other things besides the string pool!
>
> The strings of type 1 create a difficulty, however, unless they can
> be got rid of just after the call of |a_make_name_string| (a certain
> amount of permuting of the string pool would be required to do that).
> If they, also, are to be got rid of when the file is closed, again
> subject to the condition that they are at the top of the pool, one
> will have to (at least) remember how many of them there were.
>
> Some of this would, in fact, be rather easier in METAFONT than TeX.
> METAFONT's string pool entries have a use count, and reclaiming space
> consists of purging consecutive entries at the top of the pool whose
> use counts have all fallen to zero. One could easily arrange that the
> strings of type 1 had use counts of zero after the opening process was
> over, and that the strings of type 2 for "input" files had a use count
> of 1 which was decremented to 0 at close time; then the right things
> would happen more or less automatically. However, TeX *doesn't* have
> such use counts, and I don't really suppose Don is going to introduce
> them in order to solve this problem.
>
> Chris Thompson
>  -------
>
>   [ dek:
>         I think the strings are also needed for font file names.
>         For ordinary input files I put the special code into \S537
>         [which CET1 disabled] so that the Math Reviews could input
>         lots of files.
>         Of course there's a workaround (using the operating system
>         to concatenate files!) but otherwise all I can suggest is a
>         local change-file routine that tries to reclaim string space
>         when closing files if the unneeded strings are still at the
>         end of the string pool.  You could introduce a new array
>         indexed by 1..max_in_open to keep relevant status information
>         if it isn't already present (see \S304).
>   ]
`