Hello, (yes, it's me again ;-) Yesterday I noticed that when \dump-ing a format file, the Lua states are not dumped as well, discarding all initializations made while creating the format. This is not a problem, since the initializations can be put into \everyjob, and of course from a technical angle it would be really difficult to dump a Lua state to the format file (requiring access to the internals of the Lua VM which can change at any time), but IMO this should be noted in the manual. Jonathan
Jonathan Sauer wrote:
Hello,
(yes, it's me again ;-)
Yesterday I noticed that when \dump-ing a format file, the Lua states are not dumped as well, discarding all initializations made while creating the format.
this is the reason for the bytecode registers; these are saved; you can store luacode in there and initialize that at runtime (when the format is loaded); there is no pre-cooked behaviour, it's under your control; depending on the amount of code to be loaded, it makes sense to use this mechanism or not (loading lua code is rather fast) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hello,
Yesterday I noticed that when \dump-ing a format file, the Lua states are not dumped as well, discarding all initializations made while creating the format.
this is the reason for the bytecode registers; these are saved; you can store luacode in there and initialize that at runtime (when the format is loaded); there is no pre-cooked behaviour, it's under your control; depending on the amount of code to be loaded, it makes sense to use this mechanism or not (loading lua code is rather fast)
Thanks for your answer! After reading up on bytecode registers in the manual, one question is left, though: How many of them are there? The usual 65536? Thanks in advance, Jonathan
Jonathan Sauer wrote:
Hello,
Yesterday I noticed that when \dump-ing a format file, the Lua states are not dumped as well, discarding all initializations made while creating the format. this is the reason for the bytecode registers; these are saved; you can store luacode in there and initialize that at runtime (when the format is loaded); there is no pre-cooked behaviour, it's under your control; depending on the amount of code to be loaded, it makes sense to use this mechanism or not (loading lua code is rather fast)
Thanks for your answer! After reading up on bytecode registers in the manual, one question is left, though:
How many of them are there? The usual 65536?
more -) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Thanks for your answer! After reading up on bytecode registers in
Hello, the
manual, one question is left, though:
How many of them are there? The usual 65536?
more -)
Well, I experimented a bit and am now a lot wiser. Also, I have confirmation on what I thought all along: My computer's internal harddisk is much too slow: How to measure the performance of virtual memory: \directlua0{lua.bytecode[100000000] = function() end} On a more serious matter, though, there is a buffer overflow in llualib.c, function set_bytecode: If `sizeof(bytecode)*(k+1)' is larger than UINT_MAX, a numeric overflow occurs and not enough memory is allocated in line 162. When initializing the newly allocated bytecode registers afterwards, random memory is overwritten. Example (without exploit, of course): Register number = 1000000000: $ luatex This is luaTeX, Version 3.141592-beta-0.11.2-2007091918 (Web2C 7.5.6) **\directlua0{lua.setbytecode(1000000000,function() end)} luatex(515) malloc: *** vm_allocate(size=3115102208) failed (error code=3) luatex(515) malloc: *** error: can't allocate region luatex(515) malloc: *** set a breakpoint in szone_error to debug fatal: memory exhausted (xmalloc of 3115098128 bytes). (note that `3115098128' is already the result of an overflow. On my machine, sizeof(bytecode) is 16, so 16000000016[1] bytes should have been allocated [or tried to]. 16000000016 mod 2^32 = 3115098128) Register number = 2000000000: $ luatex This is luaTeX, Version 3.141592-beta-0.11.2-2007091918 (Web2C 7.5.6) **\directlua0{lua.setbytecode(2000000000,function() end)} Killed (I killed the process in the latter experiment, as it still consumed too much memory [about 1.9GB]. But: Even though the memory requirements have been higher than in the first example, in this case the allocation did not fail, as the overflow created a smaller allocation, `only' 1935228944 bytes) Register number = 3000000000: $ luatex This is luaTeX, Version 3.141592-beta-0.11.2-2007091918 (Web2C 7.5.6) **\directlua0{lua.setbytecode(3000000000,function() end)} luatex(528) malloc: *** Deallocation of a pointer not malloced: 0x1b4d0c0; This could be a double free(), or free() called with the middle of an allocated block; Try setting environment variable MallocHelp to see tools to help debug ^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@e on^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@ ^^@^^@^^@^^@^^@^^@^^@^^@ Ple^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@^^@ther^^@^^@^^@^^@^^@^^@^^@ript^^@^ ^@^^@ ^^@^^@^^@^^@^^@^^@^^@^^@^^@ (followed by a more-or-less crash [strangely, LuaTeX was still active afterwards[3]] due to accessing invalid memory. In this case, only 755359760 bytes were allocated, even though 48000000016 were required. The ^^@ demonstrate how memory has been overwritten with zeros) IMO it would be sensible to limit the number of bytecode registers to UINT_MAX/sizeof(bytecode) or -- to be platform-independent in the light of 64 bit processors[2] -- (2^32-1)/sizeof(bytecode). [1] LuaTeX always allocates memory for one additional register [2] Where UINT_MAX might well be 2^64-1 [3] I think, this is the result of the sig-handler LuaTeX installs which displays an error message. But it seems that this message has been overwritten as well. Except for `e on', `Ple', `ther' and `ript' -- let's hope LuaTeX does not pull an Event Horizon (from the movie) here. But I think this is just because bytecode.alloc is not initialized, so the last four bytes of every 16 byte block of LuaTeX's memory are not overwritten. Of course, all of this is void, if the current bytecode register implementation is only temporary. Jonathan
Jonathan Sauer wrote:
Well, I experimented a bit and am now a lot wiser. Also, I have confirmation on what I thought all along: My computer's internal harddisk is much too slow:
How to measure the performance of virtual memory:
\directlua0{lua.bytecode[100000000] = function() end}
works ok here (but i have a 4 gig machine); technically it should only allocate one lua state; on the other hand, it may be that a non sparse array is used for the register housekeeping
On a more serious matter, though, there is a buffer overflow in llualib.c, function set_bytecode: If `sizeof(bytecode)*(k+1)' is larger than UINT_MAX, a numeric overflow occurs and not enough memory is allocated in line 162. When initializing the newly allocated bytecode registers afterwards, random memory is overwritten.
Example (without exploit, of course):
Register number = 1000000000:
such overruns happen also when you do wild things with lua code, tex does not manage that memory another overflow can be in the piping data to tex (tex.print) .. if you collect 2 gig data there you may also run into problems
(I killed the process in the latter experiment, as it still consumed too much memory [about 1.9GB]. But: Even though the memory requirements have been higher than in the first example, in this case the allocation did not fail, as the overflow created a smaller allocation, `only' 1935228944 bytes)
Register number = 3000000000:
test \directlua0{lua.bytecode[1024*1024*1024*1024] = function() end} test \directlua0{for i=1,100000000 do lua.bytecode[i] = function() end end } seems to work ok here but test \directlua0{lua.bytecode[3000000000] = function() end} test \directlua0{for i=1,3000000000 do lua.bytecode[i] = function() end end } report a problem with a negavtive value, so it may be that there is a problem there (not sure if taco tests the max value) ! LuaTeX error negative values not allowed. l.7 ...{lua.bytecode[3000000000] = function() end}
IMO it would be sensible to limit the number of bytecode registers to UINT_MAX/sizeof(bytecode) or -- to be platform-independent in the light of 64 bit processors[2] -- (2^32-1)/sizeof(bytecode).
such a limitation is not that meaningful; one can have 1 milion bytecode functions savely but 10 using large datastructures and bombing; there is no control over the lua end of the game; also, when using much data, in practice the garbage collectors will bring down your system (so slow that one will abort the job); luatex kind of assumes modern memory management and machines with memort in the gig range
[3] I think, this is the result of the sig-handler LuaTeX installs which displays an error message. But it seems that this message has been overwritten as well.
error messages and cathing errors with proper messages is on the agenda for next year but segfaults and crashes indeed need to be cached (i can at least imagine a practical limit of 64K bytecode registers)
Of course, all of this is void, if the current bytecode register implementation is only temporary.
not that i know -) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hello,
Well, I experimented a bit and am now a lot wiser. Also, I have confirmation on what I thought all along: My computer's internal harddisk is much too slow:
How to measure the performance of virtual memory:
\directlua0{lua.bytecode[100000000] = function() end}
works ok here (but i have a 4 gig machine);
Well, yes, since it allocates about 1.6 GB. With 4 GB (what luxury ;-), virtual memory is not needed. I have "only" 1 GB, so ...
technically it should only allocate one lua state; on the other hand, it may be that a non sparse array is used for the register housekeeping
No, right now it allocates 100000000 byte registers. It does not use a sparse array.
On a more serious matter, though, there is a buffer overflow in llualib.c, function set_bytecode: If `sizeof(bytecode)*(k+1)' is larger than UINT_MAX, a numeric overflow occurs and not enough memory is allocated in line 162. When initializing the newly allocated bytecode registers afterwards, random memory is overwritten.
Example (without exploit, of course):
Register number = 1000000000:
such overruns happen also when you do wild things with lua code, tex does not manage that memory
It is not a question of managing the memory, but of not allocating enough memory: LuaTeX assumes it has allocated A bytes, but in fact only allocates B bytes (B < A).
another overflow can be in the piping data to tex (tex.print) .. if you collect 2 gig data there you may also run into problems
Then a check should be added so the problem can be caught before an overflow happens.
(I killed the process in the latter experiment, as it still consumed
too much memory [about 1.9GB]. But: Even though the memory requirements have been higher than in the first example, in this case the allocation did not fail, as the overflow created a smaller allocation, `only' 1935228944 bytes)
Register number = 3000000000:
test \directlua0{lua.bytecode[1024*1024*1024*1024] = function() end}
This results in a numeric overflow, and bytecode register 2^40 mod 2^32 = 2^8 = 256 is set (if my calculations are not mistaken). In this case, the numeric overflow also influences what register is accessed and not only the amount of memory allocated, so there is no buffer overflow.
test \directlua0{for i=1,100000000 do lua.bytecode[i] = function() end end }
This one only allocates 100,000,000 registers, so only 1.6 GB are required.
seems to work ok here but
test \directlua0{lua.bytecode[3000000000] = function() end}
test \directlua0{for i=1,3000000000 do lua.bytecode[i] = function() end end }
report a problem with a negavtive value, so it may be that there is a problem there (not sure if taco tests the max value)
! LuaTeX error negative values not allowed. l.7 ...{lua.bytecode[3000000000] = function() end}
Interesting. I notice you use array access, whereas in my experiments I used lua.setbytecode. Maybe there is a difference.
IMO it would be sensible to limit the number of bytecode registers to UINT_MAX/sizeof(bytecode) or -- to be platform-independent in the light of 64 bit processors[2] -- (2^32-1)/sizeof(bytecode).
such a limitation is not that meaningful; one can have 1 milion bytecode functions savely but 10 using large datastructures and bombing;
The problem is not that LuaTeX runs out of memory, but that it overwrites memory it has not allocated: http://en.wikipedia.org/wiki/Buffer_overflow And in the current implementation (since it does not use sparse array), there is a fixed upper limit on how many bytecode registers can be used before such a buffer overflow occurs, no matter how many memory the machine has (although on 64 bit machines the limit is really high).
there is no control over the lua end of the game; also, when using much data, in practice the garbage collectors will bring down your system (so slow that one will abort the job);
Interesting. In what use cases did you observe this behaviour?
luatex kind of assumes modern memory management
This is surely a given, since LuaTeX runs on Unix and Windows.
and machines with memort in the gig range
This I think is a bit optimistic and IMO limits the usefulness of LuaTeX. Especially since TeX has much lower requirements.
[3] I think, this is the result of the sig-handler LuaTeX installs which displays an error message. But it seems that this message has been overwritten as well.
error messages and cathing errors with proper messages is on the agenda for next year but segfaults and crashes indeed need to be cached
The problem is not caching, but that LuaTeX, when accessing the bytecode register, has overwritten almost all memory (the first 12 bytes of each 16 byte block) with zeros. This is the result of the buffer overflow. But: If the error message was inside LuaTeX's machine code (as opposed to LuaTeX's data), IIRC it could not be overwritten, since in RAM, an application's machine code is kept separately from its data, and also is write-protected.
(i can at least imagine a practical limit of 64K bytecode registers)
Yes! This would solve the buffer overflow. And it would make the current non-sparse-array-implementation viable.
Hans
Jonathan
Jonathan Sauer wrote:
there is no control over the lua end of the game; also, when using much data, in practice the garbage collectors will bring down your system (so slow that one will abort the job);
Interesting. In what use cases did you observe this behaviour?
extensive use of the token callback is one i remember, and a previous implementation of node callbacks passes tables instead of userdata which was also slow (has to do with the moment the collector steps in); by now i have developped a kind of feeling where/how to speed up things
luatex kind of assumes modern memory management
This is surely a given, since LuaTeX runs on Unix and Windows.
and machines with memort in the gig range
This I think is a bit optimistic and IMO limits the usefulness of LuaTeX. Especially since TeX has much lower requirements.
sure, but luatex is not tex; for large jobs (say a couple of hundred pages with many advanced open type fonts, many graphics, color, hyperlinks or whatever takes memory) topping at of 400-500 meg is not uncommon and given todays machines we find that acceptable; it also depends on what kind of trickery one does
The problem is not caching, but that LuaTeX, when accessing the bytecode register, has overwritten almost all memory (the first 12 bytes of each 16 byte block) with zeros. This is the result of the buffer overflow.
sure, and that need to be fixed; however, the kind of message (and controling that) is for later hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hans Hagen wrote:
Jonathan Sauer wrote:
Well, I experimented a bit and am now a lot wiser. Also, I have confirmation on what I thought all along: My computer's internal harddisk is much too slow:
How to measure the performance of virtual memory:
\directlua0{lua.bytecode[100000000] = function() end}
works ok here (but i have a 4 gig machine); technically it should only allocate one lua state; on the other hand, it may be that a non sparse array is used for the register housekeeping
Yes, and that itself is a temporary measure. Nevertheless, the integer overflow is serious enough to warrant a fix in the next beta (it will be a while before I get to cleaning up the bytecode array).
another overflow can be in the piping data to tex (tex.print) .. if you collect 2 gig data there you may also run into problems
Not so sure. Lua itself would probably have stopped you before that, but I can't test that now as it would take forever on this machine.
IMO it would be sensible to limit the number of bytecode registers to UINT_MAX/sizeof(bytecode) or -- to be platform-independent in the light of 64 bit processors[2] -- (2^32-1)/sizeof(bytecode).
such a limitation is not that meaningful;
In practice it is, as it prevents uncontrolled crashes.
[3] I think, this is the result of the sig-handler LuaTeX installs which displays an error message. But it seems that this message has been overwritten as well.
It is actually the message in the crash handler in the C runtime. Best wishes, Taco
Hello,
Nevertheless, the integer overflow is serious enough to warrant a fix in the next beta
Great! (even though I think the risk of an exploit is quite low)
(it will be a while before I get to cleaning up the bytecode array).
No problem. But IMO this is an important piece of information: To know if something will be cleaned up or is there to stay (modulo bug fixes). This is a problem I stumbled on several times now: There is a TODO list in the LuaTeX manual, but this list, it seems, only contains the missing features, but not the features awaiting cleanup. So it is difficult to determine if something is a bug or simply the result of a temporary implementation, and therefore to decide if it should be reported on this list. It would help a lot to note temporary implementations in the source with i.e. "TODO: CLEANUP".
another overflow can be in the piping data to tex (tex.print) .. if you collect 2 gig data there you may also run into problems
Not so sure. Lua itself would probably have stopped you before that, but I can't test that now as it would take forever on this machine.
Hans, since you have more than enough RAM for everyone (more than 640K ;-), can you test this?
[3] I think, this is the result of the sig-handler LuaTeX installs which displays an error message. But it seems that this message has
been overwritten as well.
It is actually the message in the crash handler in the C runtime.
Are you sure? LuaTeX installs its own sig-handler which displays the usual TeX error prompt. I exited the crashed LuaTeX by typing 'x' and <return>.
Best wishes, Taco
Jonathan
Jonathan Sauer wrote:
Hans, since you have more than enough RAM for everyone (more than 640K ;-), can you test this?
i quit the 100 million loop after half an hour but it had not segfaulted then yet Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
participants (3)
-
Hans Hagen
-
Jonathan Sauer
-
Taco Hoekwater