[NTG-pdftex] processing speed
Philip TAYLOR (Ret'd)
P.Taylor at Rhul.Ac.Uk
Sat May 30 00:13:40 CEST 2009
Reinhard Kotucha wrote:
> Does the string pool contain the hash for control sequences? This
> would explain the behavior. From texmf.cnf:
>
> % Max number of characters in all strings, including all error messages,
> % help texts, font names, control sequences. These values apply to TeX and MP.
> pool_size = 1250000
>
> Maybe PGF creates a lot of control sequences at runtime, using \csname
> and \endcsname in macros. This would increase the control sequence
> hash and then it takes more time to find a particular macro.
>
> But if they are created dynamically at runtime, they are created
> within a group (\begin{tikzpicture}...\end{tikzpicture}) and I expect
> that everything created within a paricular group is removed from the
> hash after \endgroup.
I no longer remember if that is the case, but DEK's
words that I cited earlier were extracted from a much
longer message which I now repeat /verbatim/ below;
it certainly makes references to the impact of control
sequences on the string pool.
** Phil.
--------
> File name overflow of string pool
>
> [ Since this report, I have seen a couple of other reports on this
> topic in the electronic discussion lists, mostly from Europe.
> While not a bug, it can certainly be a serious inconvenience.
> A couple of the reports have mentioned building nonstandard
> versions of TeX with a separate pool of file names; not good for
> compatibility. ]
>
> Date: Fri, 12 Jul 91 19:06 +0200
> From: "Johannes L. Braams" <J.L.Braams%pttrnl.nl at pucc.PRINCETON.EDU>
> Subject: Bug/misfeature in TeX?
>
> We have run into a problem with TeX.
> We have an application where we would like to \input about
> 2400 files. We can't do that because TeX runs out of string pool
> space. This application is rather important because it concerns
> the reports the lab has to make each quarter of a year.
>
> When I studied TeX the program to find out what happens when a file
> is being \input I found that the name of the file is stored in
> string pool. AND it never gets removed from the string pool (as far
> as I could find out).
> What I don't understand is why filenames are written to string pool
> in the first place.
> Isn't it possible to use some kind of stack or array mechanism to
> store filenames? It should then be possible to free the memory
> used to store a filename when the file gets closed and the filename
> is no longer needed.
>
> Do you know the answer or someone who does? Or is this a bug? I would
> rather call it a design flaw actually.
>
> Regards,
>
> Johannes Braams
>
> PTT Research Neher Laboratorium, P.O. box 421,
> 2260 AK Leidschendam, The Netherlands.
> Phone : +31 70 3325051 E-mail : JL_Braams at pttrnl.nl
> Fax : +31 70 3326477
> -------
> Date: Mon, 15 Jul 91 01:59:22 BST
> From: Chris Thompson <CET1 at phoenix.cambridge.ac.uk>
> Subject: Re: Bug/misfeature in TeX?
>
> I agree that it's a design flaw, not a bug. People do keep falling
> over it from time to time, though, so maybe Don could be asked to
> think about it again. I suspect, however, that there is no easy fix,
> for reasons I will explain below.
>
> Johannes asks why the names go in the string pool in the first place:
> the answer to that is "why not?"... it is the convenient place to keep
> more or less arbitrarily long strings. The space occupied by things
> added to the string pool can be reclaimed, provided it is done straight
> away, before other parts of TeX have been exercised that may add other
> strings (especially, control sequence names) to the pool. There are
> two types of file name to think about (neither of which are reclaimed
> at the moment, with one partial---and wrong---exception):
>
> 1. The 1, 2 or 3 strings generated by |scan_file_name|. Usually these
> are used in some implementation-dependant way to open a file, and
> maybe then as arguments to |*_make_name_string|, and are then never
> needed again; and all this would usually happen straight away.
> Exception: deferred (non-\immediate) \openout's.
>
> 2. The string generated by |*_make_name_string|. For things like the
> log and DVI files, this has to be kept for ever (printing them is
> almost the last thing TeX does). The interesting case, however, is
> \input. The string is printed (immediately), and then stored in the
> |name_field| of the current input stack entry. *Almost* the only
> thing TeX uses it for thereafter is as a number > 17 (to distinguish
> the case of an input level being an \input file (as opposed to
> terminal input or a \read level). The sole exception is in section
> 84 where it is used to deal with the "E" response to the error
> prompt: in distribution TeX as part of a message, but in practice
> as input to the implementation-dependant way of invoking an editor.
>
> (BEGIN ASIDE
>
> The ``partial and wrong exception'' is the code in section 537
> introduced by change 283. |start_input| reclaims the space occupied
> by the result of |a_make_name_string|, if that is still the top string
> in the pool, and replaces it by the `name' part of the results of
> |scan_file_name|. I have had to undo this "fix" in my implementations:
> the *only* thing that the ``file name'' is needed for is as an argument
> to the editor, and it is an unwarranted assumption that
>
> a. The values of the `area' and `extension' parts of the name are
> irrelevant to that purpose, and
>
> b. The output of |a_make_name_string| doesn't contain extra information,
> available as a result of the opening process, that may also be
> relevant.
>
> END ASIDE)
>
> In theory the contents of the strings of type 2 for \input files could
> be kept on some sort of separate stack, as Johannes suggests (parallel
> to the |input_file| and |line_stack| arrays), but this would be quite
> convoluted and involve a lot of duplication of code. More plausible
> would be an attempt to reclaim them if they are still the top string
> in the pool when the file is closed (in |end_file_reading|); this isn't
> so unlikely in cases like Johannes'... presumably not all 2400 files
> can use never-before-encountered control sequences, or he will be
> running out of other things besides the string pool!
>
> The strings of type 1 create a difficulty, however, unless they can
> be got rid of just after the call of |a_make_name_string| (a certain
> amount of permuting of the string pool would be required to do that).
> If they, also, are to be got rid of when the file is closed, again
> subject to the condition that they are at the top of the pool, one
> will have to (at least) remember how many of them there were.
>
> Some of this would, in fact, be rather easier in METAFONT than TeX.
> METAFONT's string pool entries have a use count, and reclaiming space
> consists of purging consecutive entries at the top of the pool whose
> use counts have all fallen to zero. One could easily arrange that the
> strings of type 1 had use counts of zero after the opening process was
> over, and that the strings of type 2 for "input" files had a use count
> of 1 which was decremented to 0 at close time; then the right things
> would happen more or less automatically. However, TeX *doesn't* have
> such use counts, and I don't really suppose Don is going to introduce
> them in order to solve this problem.
>
> Chris Thompson
> -------
>
> [ dek:
> I think the strings are also needed for font file names.
> For ordinary input files I put the special code into \S537
> [which CET1 disabled] so that the Math Reviews could input
> lots of files.
> Of course there's a workaround (using the operating system
> to concatenate files!) but otherwise all I can suggest is a
> local change-file routine that tries to reclaim string space
> when closing files if the unneeded strings are still at the
> end of the string pool. You could introduce a new array
> indexed by 1..max_in_open to keep relevant status information
> if it isn't already present (see \S304).
> ]
More information about the ntg-pdftex
mailing list