processing speed

newer
Re: [NTG-pdftex] Bug#532074:...

Reinhard Kotucha

29 May 2009 29 May '09

1:05 a.m.

Hi, did anybody observe already that pdftex gets slower and slower when processing huge documents? I recently wrote a few perl scripts which create huge LaTeX files containing data from spectrum analyzers and I'm using the pgfplots package in order to visualize the results. On my humble Windows machine it takes about one second to create such a plot. This is ok even if I have to create thousands of them. But now I have to create thousands of them, and I noticed that the amount of time needed to produce one single plot is increasing steadily. I first thought that it's a pgfplots issue but I created test files which confirm that it's a pdftex problem. Download the tiny Perl script http://tug.org/~kotucha/mktestfiles.gz and then try: tex --shell-escape testplain.tex and pdftex --shell-escape testplain.tex What you'll see on screen is the output of "date +%s" whenever the \input file had been processed. (Thus, it works only on UNIX). You'll see that at the beginning Knuth's TeX and pdfTeX need approximately the same amount of time to process one \input file. But while Knuth's TeX needs the same amount of time to process any \input file, the amount of time pdfTeX needs increases steadily. Regards, Reinhard -- ---------------------------------------------------------------------------- Reinhard Kotucha Phone: +49-511-3373112 Marschnerstr. 25 D-30167 Hannover mailto:reinhard.kotucha@web.de ---------------------------------------------------------------------------- Microsoft isn't the answer. Microsoft is the question, and the answer is NO. ----------------------------------------------------------------------------

Show replies by date

Taco Hoekwater

29 May 29 May

10:04 a.m.

Hi Reinhard, Reinhard Kotucha wrote:

...

You'll see that at the beginning Knuth's TeX and pdfTeX need approximately the same amount of time to process one \input file. But while Knuth's TeX needs the same amount of time to process any \input file, the amount of time pdfTeX needs increases steadily.

I tried with pdftex only (1.50.0-alpha-20080414) I get about 5 sec intervals for the pdftex version, but this seems to be constant. At what point does it start slowing down for you? Best wishes, Taco

The Thanh Han

10:23 a.m.

On Fri, May 29, 2009 at 01:05:40AM +0200, Reinhard Kotucha wrote:

...

Hi, did anybody observe already that pdftex gets slower and slower when processing huge documents?

[...]

You'll see that at the beginning Knuth's TeX and pdfTeX need approximately the same amount of time to process one \input file. But while Knuth's TeX needs the same amount of time to process any \input file, the amount of time pdfTeX needs increases steadily.

I confirm this symptom with pdftex-1.40.9. Not sure what is the cause yet... Thanh

Taco Hoekwater

10:58 a.m.

The Thanh Han wrote:

...

On Fri, May 29, 2009 at 01:05:40AM +0200, Reinhard Kotucha wrote:

...
Hi, did anybody observe already that pdftex gets slower and slower when processing huge documents?

[...]

You'll see that at the beginning Knuth's TeX and pdfTeX need approximately the same amount of time to process one \input file. But while Knuth's TeX needs the same amount of time to process any \input file, the amount of time pdfTeX needs increases steadily.

I confirm this symptom with pdftex-1.40.9. Not sure what is the cause yet...

Ah now I have it too (after a much longer run). Is it string pool growth maybe (from the \input file names)? Best wishes, Taco

Hans Hagen

11:02 a.m.

The Thanh Han wrote:

...

On Fri, May 29, 2009 at 01:05:40AM +0200, Reinhard Kotucha wrote:

...
Hi, did anybody observe already that pdftex gets slower and slower when processing huge documents?

[...]

You'll see that at the beginning Knuth's TeX and pdfTeX need approximately the same amount of time to process one \input file. But while Knuth's TeX needs the same amount of time to process any \input file, the amount of time pdfTeX needs increases steadily.

I confirm this symptom with pdftex-1.40.9. Not sure what is the cause yet...

i did a couple of tests (on windows so with other test files) and it looks like the problem is with the 'filename' (actually this is a kind of known issue); the filename ends up in the pool (and string) memory and although (if i remember right) etex reclaims that memory it looks like in pdftex (and luatex) that's not the case; eventually i run out of pool in both putting the sample text in a macro (i.e. reading it once) works fine so it is related to the "\input <filename>" Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

The Thanh Han

6:48 p.m.

On Fri, May 29, 2009 at 11:02:22AM +0200, Hans Hagen wrote:

...

The Thanh Han wrote:

...
On Fri, May 29, 2009 at 01:05:40AM +0200, Reinhard Kotucha wrote:

...
Hi, did anybody observe already that pdftex gets slower and slower when processing huge documents?

[...]

You'll see that at the beginning Knuth's TeX and pdfTeX need approximately the same amount of time to process one \input file. But while Knuth's TeX needs the same amount of time to process any \input file, the amount of time pdfTeX needs increases steadily.

I confirm this symptom with pdftex-1.40.9. Not sure what is the cause yet...

i did a couple of tests (on windows so with other test files) and it looks like the problem is with the 'filename' (actually this is a kind of known issue); the filename ends up in the pool (and string) memory and although (if i remember right) etex reclaims that memory it looks like in pdftex (and luatex) that's not the case; eventually i run out of pool in both

putting the sample text in a macro (i.e. reading it once) works fine so it is related to the "\input <filename>"

indeed it must be something with string recycling (in tex.ch) during "\input something"; however both tex & pdftex (in texlive) seem not to release the filename string, but pdftex takes longer to process later \input's. FWIW, pdftex doesn't create new strings in Reinhard's example (it runs in dvi mode). Thanh

Hans Hagen

7:26 p.m.

The Thanh Han wrote:

...

indeed it must be something with string recycling (in tex.ch) during "\input something"; however both tex & pdftex (in texlive) seem not to release the filename string, but pdftex takes longer to process later \input's. FWIW, pdftex doesn't create new strings in Reinhard's example (it runs in dvi mode).

i remember a discussion about this filename issue and that it is supposed to be reclaimed when it's the last thing added to the pool (some etex thing) but i cannot find anything on the web Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Philip TAYLOR (Ret'd)

7:40 p.m.

An interesting note from DEL on a related prolem :

...

[ dek: I think the strings are also needed for font file names. For ordinary input files I put the special code into \S537 [which CET1 disabled] so that the Math Reviews could input lots of files. Of course there's a workaround (using the operating system to concatenate files!) but otherwise all I can suggest is a local change-file routine that tries to reclaim string space when closing files if the unneeded strings are still at the end of the string pool. You could introduce a new array indexed by 1..max_in_open to keep relevant status information if it isn't already present (see \S304). ]

Hans Hagen

8:29 p.m.

Philip TAYLOR (Ret'd) wrote:

...

An interesting note from DEL on a related prolem :

...
[ dek: I think the strings are also needed for font file names. For ordinary input files I put the special code into \S537 [which CET1 disabled] so that the Math Reviews could input lots of files. Of course there's a workaround (using the operating system to concatenate files!) but otherwise all I can suggest is a local change-file routine that tries to reclaim string space when closing files if the unneeded strings are still at the end of the string pool. You could introduce a new array indexed by 1..max_in_open to keep relevant status information if it isn't already present (see \S304). ]

ah, so we're on the righttrack just a thought .. can it be related to the synctex patch as it also keeps track of files ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Philip TAYLOR (Ret'd)

8:46 p.m.

Not familiar with syncTeX, but I think that a closer look at the revision history of \S537 (and liaison with CET1 to see why he disabled Don's patch) might help ... ** Phil. -------- Hans Hagen wrote:

...

ah, so we're on the righttrack

just a thought .. can it be related to the synctex patch as it also keeps track of files

Reinhard Kotucha

30 May 30 May

12:01 a.m.

On 29 May 2009 Hans Hagen wrote:

...

The Thanh Han wrote:

...
indeed it must be something with string recycling (in tex.ch) during "\input something"; however both tex & pdftex (in texlive) seem not to release the filename string, but pdftex takes longer to process later \input's. FWIW, pdftex doesn't create new strings in Reinhard's example (it runs in dvi mode).

i remember a discussion about this filename issue and that it is supposed to be reclaimed when it's the last thing added to the pool (some etex thing) but i cannot find anything on the web

Hi, I rember vaguely that there had been a discussion a couple of years ago. The string pool problem had been known very well at this time, the old version of Keith Refdahl's epslatex.pdf suggested to specify full filenames (with extensions) as arguments to \includegraphics. Otherwise \includegraphics would use \openin in order to search for a file with an appropriate extension and increases the string pool. At this time the small size if the string pool was problematic, not processing speed. Olaf Weber once said that he planned to solve this problem but I never heard anything about it again. If this exactly the problem we are talking about, it seems that he fixed it because Knuth's TeX doesn't have this problem any more. There had been a few changes in pdfTeX afterwards, one is the integration of e-TeX, but there had also been an upgrade of TeX itself (3.1415926). Actually, my problem has nothing to do with file names. My Perl script produces one big LaTeX file (30 MB) and after the preamble, \input isn't used at all. Does the string pool contain the hash for control sequences? This would explain the behavior. From texmf.cnf: % Max number of characters in all strings, including all error messages, % help texts, font names, control sequences. These values apply to TeX and MP. pool_size = 1250000 Maybe PGF creates a lot of control sequences at runtime, using \csname and \endcsname in macros. This would increase the control sequence hash and then it takes more time to find a particular macro. But if they are created dynamically at runtime, they are created within a group (\begin{tikzpicture}...\end{tikzpicture}) and I expect that everything created within a paricular group is removed from the hash after \endgroup. I have no idea what's happening. BTW, please excuse me that I didn't provide a better test file. I assumed that the problem is caused by large files, not by \input. But the idea was to create a test file which works with pdfTeX and DEK's TeX in order to find out whether they behave differently. With pgfplots the problem is more obvious: I started pdftex in the late morning. When I later looked into the log file I noticed that it already created more than 700 pages. Thus, I assumed that I get a result after the lunch break. But it finished in the evening, two minutes before I had to shut down the computer, otherwise I had missed the train to Hannover. It would be fine if the problem could be solved, one way or the other. But since I'm obviously the only one who encountered this problem and the problem obviously exists for years, I propose to change nothing before TeX Live 2009 is released. Regards, Reinhard -- ---------------------------------------------------------------------------- Reinhard Kotucha Phone: +49-511-3373112 Marschnerstr. 25 D-30167 Hannover mailto:reinhard.kotucha@web.de ---------------------------------------------------------------------------- Microsoft isn't the answer. Microsoft is the question, and the answer is NO. ----------------------------------------------------------------------------

Hartmut Henkel

12:12 a.m.

On Sat, 30 May 2009, Reinhard Kotucha wrote:

...

Does the string pool contain the hash for control sequences? This would explain the behavior. From texmf.cnf:

% Max number of characters in all strings, including all error messages, % help texts, font names, control sequences. These values apply to TeX and MP. pool_size = 1250000

fwiw, the string pool increases only slowly with every input file, and the most recent contents after each ship_out is: ...exsometext.tex./sometext.texsometext.tex./sometext.texsometext.tex./sometext.tex Regards, Hartmut

Philip TAYLOR (Ret'd)

12:13 a.m.

Reinhard Kotucha wrote:

...

Does the string pool contain the hash for control sequences? This would explain the behavior. From texmf.cnf:

% Max number of characters in all strings, including all error messages, % help texts, font names, control sequences. These values apply to TeX and MP. pool_size = 1250000

Maybe PGF creates a lot of control sequences at runtime, using \csname and \endcsname in macros. This would increase the control sequence hash and then it takes more time to find a particular macro.

But if they are created dynamically at runtime, they are created within a group (\begin{tikzpicture}...\end{tikzpicture}) and I expect that everything created within a paricular group is removed from the hash after \endgroup.

I no longer remember if that is the case, but DEK's words that I cited earlier were extracted from a much longer message which I now repeat /verbatim/ below; it certainly makes references to the impact of control sequences on the string pool. ** Phil. --------

...

File name overflow of string pool

[ Since this report, I have seen a couple of other reports on this topic in the electronic discussion lists, mostly from Europe. While not a bug, it can certainly be a serious inconvenience. A couple of the reports have mentioned building nonstandard versions of TeX with a separate pool of file names; not good for compatibility. ]

Date: Fri, 12 Jul 91 19:06 +0200 From: "Johannes L. Braams" Subject: Bug/misfeature in TeX?

We have run into a problem with TeX. We have an application where we would like to \input about 2400 files. We can't do that because TeX runs out of string pool space. This application is rather important because it concerns the reports the lab has to make each quarter of a year.

When I studied TeX the program to find out what happens when a file is being \input I found that the name of the file is stored in string pool. AND it never gets removed from the string pool (as far as I could find out). What I don't understand is why filenames are written to string pool in the first place. Isn't it possible to use some kind of stack or array mechanism to store filenames? It should then be possible to free the memory used to store a filename when the file gets closed and the filename is no longer needed.

Do you know the answer or someone who does? Or is this a bug? I would rather call it a design flaw actually.

Regards,

Johannes Braams

PTT Research Neher Laboratorium, P.O. box 421, 2260 AK Leidschendam, The Netherlands. Phone : +31 70 3325051 E-mail : JL_Braams@pttrnl.nl Fax : +31 70 3326477 ------- Date: Mon, 15 Jul 91 01:59:22 BST From: Chris Thompson Subject: Re: Bug/misfeature in TeX?

I agree that it's a design flaw, not a bug. People do keep falling over it from time to time, though, so maybe Don could be asked to think about it again. I suspect, however, that there is no easy fix, for reasons I will explain below.

Johannes asks why the names go in the string pool in the first place: the answer to that is "why not?"... it is the convenient place to keep more or less arbitrarily long strings. The space occupied by things added to the string pool can be reclaimed, provided it is done straight away, before other parts of TeX have been exercised that may add other strings (especially, control sequence names) to the pool. There are two types of file name to think about (neither of which are reclaimed at the moment, with one partial---and wrong---exception):

1. The 1, 2 or 3 strings generated by |scan_file_name|. Usually these are used in some implementation-dependant way to open a file, and maybe then as arguments to |*_make_name_string|, and are then never needed again; and all this would usually happen straight away. Exception: deferred (non-\immediate) \openout's.

2. The string generated by |*_make_name_string|. For things like the log and DVI files, this has to be kept for ever (printing them is almost the last thing TeX does). The interesting case, however, is \input. The string is printed (immediately), and then stored in the |name_field| of the current input stack entry. *Almost* the only thing TeX uses it for thereafter is as a number > 17 (to distinguish the case of an input level being an \input file (as opposed to terminal input or a \read level). The sole exception is in section 84 where it is used to deal with the "E" response to the error prompt: in distribution TeX as part of a message, but in practice as input to the implementation-dependant way of invoking an editor.

(BEGIN ASIDE

The ``partial and wrong exception'' is the code in section 537 introduced by change 283. |start_input| reclaims the space occupied by the result of |a_make_name_string|, if that is still the top string in the pool, and replaces it by the `name' part of the results of |scan_file_name|. I have had to undo this "fix" in my implementations: the *only* thing that the ``file name'' is needed for is as an argument to the editor, and it is an unwarranted assumption that

a. The values of the `area' and `extension' parts of the name are irrelevant to that purpose, and

b. The output of |a_make_name_string| doesn't contain extra information, available as a result of the opening process, that may also be relevant.

END ASIDE)

In theory the contents of the strings of type 2 for \input files could be kept on some sort of separate stack, as Johannes suggests (parallel to the |input_file| and |line_stack| arrays), but this would be quite convoluted and involve a lot of duplication of code. More plausible would be an attempt to reclaim them if they are still the top string in the pool when the file is closed (in |end_file_reading|); this isn't so unlikely in cases like Johannes'... presumably not all 2400 files can use never-before-encountered control sequences, or he will be running out of other things besides the string pool!

The strings of type 1 create a difficulty, however, unless they can be got rid of just after the call of |a_make_name_string| (a certain amount of permuting of the string pool would be required to do that). If they, also, are to be got rid of when the file is closed, again subject to the condition that they are at the top of the pool, one will have to (at least) remember how many of them there were.

Some of this would, in fact, be rather easier in METAFONT than TeX. METAFONT's string pool entries have a use count, and reclaiming space consists of purging consecutive entries at the top of the pool whose use counts have all fallen to zero. One could easily arrange that the strings of type 1 had use counts of zero after the opening process was over, and that the strings of type 2 for "input" files had a use count of 1 which was decremented to 0 at close time; then the right things would happen more or less automatically. However, TeX *doesn't* have such use counts, and I don't really suppose Don is going to introduce them in order to solve this problem.

Chris Thompson -------

[ dek: I think the strings are also needed for font file names. For ordinary input files I put the special code into \S537 [which CET1 disabled] so that the Math Reviews could input lots of files. Of course there's a workaround (using the operating system to concatenate files!) but otherwise all I can suggest is a local change-file routine that tries to reclaim string space when closing files if the unneeded strings are still at the end of the string pool. You could introduce a new array indexed by 1..max_in_open to keep relevant status information if it isn't already present (see \S304). ]

Hans Hagen

12:42 a.m.

Reinhard Kotucha wrote:

...

Actually, my problem has nothing to do with file names. My Perl script produces one big LaTeX file (30 MB) and after the preamble, \input isn't used at all.

my impression is that your problem relates to pdf memory as well as literals accumulating (and temp taking mem)

...

Maybe PGF creates a lot of control sequences at runtime, using \csname and \endcsname in macros. This would increase the control sequence hash and then it takes more time to find a particular macro.

neglectable, and not increasing during the run i guess

...

With pgfplots the problem is more obvious: I started pdftex in the late morning. When I later looked into the log file I noticed that it already created more than 700 pages. Thus, I assumed that I get a result after the lunch break. But it finished in the evening, two minutes before I had to shut down the computer, otherwise I had missed the train to Hannover.

did you test with luatex? when we ran into large mem consumption when playing with punk fonts (using litarals etc etc) some improvements were made to the literal handling (no longer via pool) which saves mem and is faster too (we can into tricky boundary conditions too then)

...

It would be fine if the problem could be solved, one way or the other. But since I'm obviously the only one who encountered this problem and the problem obviously exists for years, I propose to change nothing before TeX Live 2009 is released.

Hartmut Henkel

1:25 a.m.

On Sat, 30 May 2009, Hans Hagen wrote:

...

my impression is that your problem relates to pdf memory as well as literals accumulating (and temp taking mem)

he does dvi.

...

...
Maybe PGF creates a lot of control sequences at runtime, using \csname and \endcsname in macros. This would increase the control sequence hash and then it takes more time to find a particular macro.

neglectable, and not increasing during the run i guess

seems so, hash fill level doesn't change. seems also that it has nothing to do with \input! Since when i replace \input by .so and use soelim -t to expand the input files to some huge file 4505098590 30. Mai 01:14 testplain-so.tex the slow-down is about the same. Also top shows no increase of pdftex memory usage. Weird. Regards, Hartmut

Hartmut Henkel

2:41 a.m.

seems the longer one lets it run in gdb, the more likely one, on a randomly entered ^C, ends up around the while loop within the chunk @= so maybe it has something to do with the rover and some linked list there. Regards, Hartmut

The Thanh Han

11:17 a.m.

On Sat, May 30, 2009 at 02:41:30AM +0200, Hartmut Henkel wrote:

...

seems the longer one lets it run in gdb, the more likely one, on a randomly entered ^C, ends up around the while loop within the chunk @=

so maybe it has something to do with the rover and some linked list there.

yes this seems to be the cause. I re-run the test with gprof and got this report: tex: ,-------- | Each sample counts as 0.01 seconds. | % cumulative self self total | time seconds seconds calls s/call s/call name | 37.61 111.05 111.05 90100100 0.00 0.00 ztrybreak | 9.33 138.60 27.55 708325535 0.00 0.00 zgetnode | 8.90 164.88 26.27 1 26.27 294.72 maincontrol | 8.16 188.97 24.09 1253467536 0.00 0.00 zbadness | 5.09 203.99 15.02 4566038 0.00 0.00 zhpack | 4.88 218.39 14.40 4566038 0.00 0.00 hlistout | 3.98 230.15 11.76 100000 0.00 0.00 zlinebreak `-------- pdftex: ,-------- | Each sample counts as 0.01 seconds. | % cumulative self self total | time seconds seconds calls s/call s/call name | 36.99 174.44 174.44 708325535 0.00 0.00 zgetnode | 27.09 302.17 127.73 90100100 0.00 0.00 ztrybreak | 6.30 331.89 29.72 1 29.72 469.55 maincontrol | 5.08 355.83 23.94 1253467536 0.00 0.00 zbadness | 3.29 371.33 15.50 4566038 0.00 0.00 zhpack | 3.28 386.79 15.46 4566038 0.00 0.00 hlistout | 3.09 401.34 14.55 100000 0.00 0.00 zlinebreak `-------- so indeed pdftex seems to spend a lot of time allocating memory. The number of zgetnode() calls is the same (708325535 in both cases), however pdftex calls took more time... Regards, Thanh

Hartmut Henkel

12:14 p.m.

On Sat, 30 May 2009, The Thanh Han wrote:

...

so indeed pdftex seems to spend a lot of time allocating memory. The number of zgetnode() calls is the same (708325535 in both cases), however pdftex calls took more time...

and it looks like 1.40.6 is ok, no slowdown. the main difference may be that 1.40.6 has on synctex yet. Regards, Hartmut

Taco Hoekwater

12:19 p.m.

Hartmut Henkel wrote:

...

On Sat, 30 May 2009, The Thanh Han wrote:

...
so indeed pdftex seems to spend a lot of time allocating memory. The number of zgetnode() calls is the same (708325535 in both cases), however pdftex calls took more time...

and it looks like 1.40.6 is ok, no slowdown. the main difference may be that 1.40.6 has on synctex yet.

I am running a luatex test, and it does not seem to slow down (it is not even at 20% yet though, this processor is fairly old) Best wishes, Taco

Hartmut Henkel

1:43 p.m.

On Sat, 30 May 2009, Taco Hoekwater wrote:

...

Hartmut Henkel wrote:

...
On Sat, 30 May 2009, The Thanh Han wrote:

...
so indeed pdftex seems to spend a lot of time allocating memory. The number of zgetnode() calls is the same (708325535 in both cases), however pdftex calls took more time...

and it looks like 1.40.6 is ok, no slowdown. the main difference may be that 1.40.6 has on synctex yet.

I am running a luatex test, and it does not seem to slow down (it is not even at 20% yet though, this processor is fairly old)

looks like you are using another get_node() method. In pdftex it seems that it rather often goes to @= There is not much time-consuming code in this chunk, but the "got restart" may give a lengthy rovering each time. At least when i increase t:=lo_mem_max+100000 so to grow mem in larger chunks, speed slowdown is much less tremendous. By then, why does it need so much more memory? Regards, Hartmut

Taco Hoekwater

1:57 p.m.

Hartmut Henkel wrote:

...

On Sat, 30 May 2009, Taco Hoekwater wrote:

...
Hartmut Henkel wrote:

...
On Sat, 30 May 2009, The Thanh Han wrote:

...
so indeed pdftex seems to spend a lot of time allocating memory. The number of zgetnode() calls is the same (708325535 in both cases), however pdftex calls took more time... and it looks like 1.40.6 is ok, no slowdown. the main difference may be that 1.40.6 has on synctex yet. I am running a luatex test, and it does not seem to slow down (it is not even at 20% yet though, this processor is fairly old)

looks like you are using another get_node() method. In pdftex it seems that it rather often goes to

@=

There is not much time-consuming code in this chunk, but the "got restart" may give a lengthy rovering each time. At least when i increase t:=lo_mem_max+100000 so to grow mem in larger chunks, speed slowdown is much less tremendous. By then, why does it need so much more memory?

Perhaps the node merge is failing too often due to fragmentation? I've quit luatex now (I got bored) and at exit it reported the following node usages: ? x node memory in use: 601252 words out of 2495936 rapidly available: 1:8, 2:8, 3:551575, 4:326, 5:726, 6:4, 7:1621, 9:14, 10:22 nodes current usage: 92 hlist, 1 vlist, 1 rule, 1891 glue, 1000 kern, 6 penalty, 7201 glyph, 293 glue_spec, 1 temp, 2 local_par, 1 dir nodes Output written on testplain.dvi (457908 pages, 590050796 bytes). Transcript written on testplain.log. of course, as I made it quit mid-page, the "in use" and "current usage" reports are not too valuable, but notice that the "rapidly available" report says that there are now 551575 nodes of size 3 available. That looks ridiculous, because nothing ever asks for that many 3-word nodes, but it helps to know that whenever there is a small bit of otherwise useless memory discovered by luatex's get_node(), this bit of memory is automatically transfered to the "rapidly available" list. That probably accounts for almost all of the 550K 3-word nodes. I suspect something similar happens in pdftex, and that this problem has somehow become more apparent with the addition of synctex (synctex makes many nodes larger, so the chance of too small chunks becomes higher). Does that make sense? Best wishes, Taco Best wishes, Taco

Reinhard Kotucha

10:42 p.m.

On 30 May 2009 Taco Hoekwater wrote:

...

I've quit luatex now (I got bored) [...]

Yes, it's boring indeed. I've created a less boring test file now. It only works with pdflatex. If you want to use it with luatex, you have to replace the two lines marked with %%%%%%%%%%%%%%%%%%%% by the luatex equivalents for \pdfresettimer and \pdfelapsedtime. http://tug.org/~kotucha/testpgfplots.tex.gz It takes 35 seconds to create the ten plots here. BTW, I tried the old test file with luatex snapshot-0.25.4-2008081309 (TL-2008) and I had not the impression that it slows down. But the new test file will demonstrate the problem much faster. Regards, Reinhard -- ---------------------------------------------------------------------------- Reinhard Kotucha Phone: +49-511-3373112 Marschnerstr. 25 D-30167 Hannover mailto:reinhard.kotucha@web.de ---------------------------------------------------------------------------- Microsoft isn't the answer. Microsoft is the question, and the answer is NO. ----------------------------------------------------------------------------

The Thanh Han

10:51 p.m.

On Sat, May 30, 2009 at 12:14:09PM +0200, Hartmut Henkel wrote:

...

On Sat, 30 May 2009, The Thanh Han wrote:

...
so indeed pdftex seems to spend a lot of time allocating memory. The number of zgetnode() calls is the same (708325535 in both cases), however pdftex calls took more time...

and it looks like 1.40.6 is ok, no slowdown. the main difference may be that 1.40.6 has on synctex yet.

I confirm that pdftex without synctex doesn't have this issue (tested with 1.40.9). Regards, Thanh

Hans Hagen

11:08 p.m.

The Thanh Han wrote:

...

On Sat, May 30, 2009 at 12:14:09PM +0200, Hartmut Henkel wrote:

...
On Sat, 30 May 2009, The Thanh Han wrote:

...
so indeed pdftex seems to spend a lot of time allocating memory. The number of zgetnode() calls is the same (708325535 in both cases), however pdftex calls took more time... and it looks like 1.40.6 is ok, no slowdown. the main difference may be that 1.40.6 has on synctex yet.

I confirm that pdftex without synctex doesn't have this issue (tested with 1.40.9).

hm, i thought that synctex, when not enabled, would not have a performance drawback Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Taco Hoekwater

11:35 p.m.

Hans Hagen wrote:

...

The Thanh Han wrote:

...
On Sat, May 30, 2009 at 12:14:09PM +0200, Hartmut Henkel wrote:

...
On Sat, 30 May 2009, The Thanh Han wrote:

...
so indeed pdftex seems to spend a lot of time allocating memory. The number of zgetnode() calls is the same (708325535 in both cases), however pdftex calls took more time... and it looks like 1.40.6 is ok, no slowdown. the main difference may be that 1.40.6 has on synctex yet.

I confirm that pdftex without synctex doesn't have this issue (tested with 1.40.9).

hm, i thought that synctex, when not enabled, would not have a performance drawback

It shouldn't, but it could be. Variable-sized memory allocation is rather tricky. I believe there is a relatively easy way to test the fragmentation issue. In get_node and free_node, just allocate and free a fixed node size if the requested size is below a certain threshold. (10 memory words should be ok, I think the only potentially larger nodes in pdftex are \parshapes, and these don't apply in this case). This wastes memory, but because now all nodes have the same size, there won't be any fragmentation problems. If it still slows down, something else is causing the problems, right? Best wishes, Taco

Reinhard Kotucha

11:38 p.m.

On 30 May 2009 Hartmut Henkel wrote:

...

and it looks like 1.40.6 is ok, no slowdown.

With 1.40.6 and the new testfile I get: ===176066=== ===188925=== ===211720=== ===228937=== ===218111=== ===227037=== ===242101=== ===262462=== ===269177=== ===279845=== The numbers denote scaled seconds. Regards, Reinhard -- ---------------------------------------------------------------------------- Reinhard Kotucha Phone: +49-511-3373112 Marschnerstr. 25 D-30167 Hannover mailto:reinhard.kotucha@web.de ---------------------------------------------------------------------------- Microsoft isn't the answer. Microsoft is the question, and the answer is NO. ----------------------------------------------------------------------------

5880

Age (days ago)

5882

Last active (days ago)

List overview

Download

25 comments

6 participants

participants (6)

Hans Hagen
Hartmut Henkel
Philip TAYLOR (Ret'd)
Reinhard Kotucha
Taco Hoekwater
The Thanh Han

processing speed

tags

participants (6)