Taco Hoekwater
David Kastrup wrote:
Hi, I've been wondering about several things with regard to Unicode/utf-8.
[...] Thanks for the explanations. Certainly reassuring.
Junk in, junk out. LuaTeX is not a file format validator, but a typesetting engine. That is what I think, anyway.
I have no problem with "junk in, junk out": after all, that was basically what my proposal of turning invalid input bytes into the transparent output characters intended to achieve. What I wanted to avoid is "junk in, crash out". Or "junk in, anything may happen". After all, "anything may happen" can imply a security risk. It would be nice if the output of Lua callbacks could not cause memory corruption or similar. If LuaTeX were to work only with utf-8 sequences it had generated itself from character codes (or some equivalent process providing those minimal guarantees about the byte sequences passed into LuaTeX that LuaTeX needs for efficient operation), this would be helpful. I know that one can crash a TeX executable with things like \def~{\if~}~ but those "just" cause a stack overflow and don't form a security risk on typical architectures (some DOS TeXs protect explicitly against this IIRC). -- David Kastrup