Hello,
Well, I experimented a bit and am now a lot wiser. Also, I have confirmation on what I thought all along: My computer's internal harddisk is much too slow:
How to measure the performance of virtual memory:
\directlua0{lua.bytecode[100000000] = function() end}
works ok here (but i have a 4 gig machine);
Well, yes, since it allocates about 1.6 GB. With 4 GB (what luxury ;-), virtual memory is not needed. I have "only" 1 GB, so ...
technically it should only allocate one lua state; on the other hand, it may be that a non sparse array is used for the register housekeeping
No, right now it allocates 100000000 byte registers. It does not use a sparse array.
On a more serious matter, though, there is a buffer overflow in llualib.c, function set_bytecode: If `sizeof(bytecode)*(k+1)' is larger than UINT_MAX, a numeric overflow occurs and not enough memory is allocated in line 162. When initializing the newly allocated bytecode registers afterwards, random memory is overwritten.
Example (without exploit, of course):
Register number = 1000000000:
such overruns happen also when you do wild things with lua code, tex does not manage that memory
It is not a question of managing the memory, but of not allocating enough memory: LuaTeX assumes it has allocated A bytes, but in fact only allocates B bytes (B < A).
another overflow can be in the piping data to tex (tex.print) .. if you collect 2 gig data there you may also run into problems
Then a check should be added so the problem can be caught before an overflow happens.
(I killed the process in the latter experiment, as it still consumed
too much memory [about 1.9GB]. But: Even though the memory requirements have been higher than in the first example, in this case the allocation did not fail, as the overflow created a smaller allocation, `only' 1935228944 bytes)
Register number = 3000000000:
test \directlua0{lua.bytecode[1024*1024*1024*1024] = function() end}
This results in a numeric overflow, and bytecode register 2^40 mod 2^32 = 2^8 = 256 is set (if my calculations are not mistaken). In this case, the numeric overflow also influences what register is accessed and not only the amount of memory allocated, so there is no buffer overflow.
test \directlua0{for i=1,100000000 do lua.bytecode[i] = function() end end }
This one only allocates 100,000,000 registers, so only 1.6 GB are required.
seems to work ok here but
test \directlua0{lua.bytecode[3000000000] = function() end}
test \directlua0{for i=1,3000000000 do lua.bytecode[i] = function() end end }
report a problem with a negavtive value, so it may be that there is a problem there (not sure if taco tests the max value)
! LuaTeX error negative values not allowed. l.7 ...{lua.bytecode[3000000000] = function() end}
Interesting. I notice you use array access, whereas in my experiments I used lua.setbytecode. Maybe there is a difference.
IMO it would be sensible to limit the number of bytecode registers to UINT_MAX/sizeof(bytecode) or -- to be platform-independent in the light of 64 bit processors[2] -- (2^32-1)/sizeof(bytecode).
such a limitation is not that meaningful; one can have 1 milion bytecode functions savely but 10 using large datastructures and bombing;
The problem is not that LuaTeX runs out of memory, but that it overwrites memory it has not allocated: http://en.wikipedia.org/wiki/Buffer_overflow And in the current implementation (since it does not use sparse array), there is a fixed upper limit on how many bytecode registers can be used before such a buffer overflow occurs, no matter how many memory the machine has (although on 64 bit machines the limit is really high).
there is no control over the lua end of the game; also, when using much data, in practice the garbage collectors will bring down your system (so slow that one will abort the job);
Interesting. In what use cases did you observe this behaviour?
luatex kind of assumes modern memory management
This is surely a given, since LuaTeX runs on Unix and Windows.
and machines with memort in the gig range
This I think is a bit optimistic and IMO limits the usefulness of LuaTeX. Especially since TeX has much lower requirements.
[3] I think, this is the result of the sig-handler LuaTeX installs which displays an error message. But it seems that this message has been overwritten as well.
error messages and cathing errors with proper messages is on the agenda for next year but segfaults and crashes indeed need to be cached
The problem is not caching, but that LuaTeX, when accessing the bytecode register, has overwritten almost all memory (the first 12 bytes of each 16 byte block) with zeros. This is the result of the buffer overflow. But: If the error message was inside LuaTeX's machine code (as opposed to LuaTeX's data), IIRC it could not be overwritten, since in RAM, an application's machine code is kept separately from its data, and also is write-protected.
(i can at least imagine a practical limit of 64K bytecode registers)
Yes! This would solve the buffer overflow. And it would make the current non-sparse-array-implementation viable.
Hans
Jonathan