[Dev-luatex] valid Unicode character treatment?

David Kastrup dak at gnu.org
Tue Apr 24 16:34:51 CEST 2007


Taco Hoekwater <taco at elvenkind.com> writes:

> David Kastrup wrote:
>> Hi, I've been wondering about several things with regard to
>> Unicode/utf-8.

[...]

Thanks for the explanations.  Certainly reassuring.

> Junk in, junk out. LuaTeX is not a file format validator, but a
> typesetting engine. That is what I think, anyway.

I have no problem with "junk in, junk out": after all, that was
basically what my proposal of turning invalid input bytes into the
transparent output characters intended to achieve.

What I wanted to avoid is "junk in, crash out".  Or "junk in, anything
may happen".  After all, "anything may happen" can imply a security
risk.  It would be nice if the output of Lua callbacks could not cause
memory corruption or similar.  If LuaTeX were to work only with utf-8
sequences it had generated itself from character codes (or some
equivalent process providing those minimal guarantees about the byte
sequences passed into LuaTeX that LuaTeX needs for efficient
operation), this would be helpful.

I know that one can crash a TeX executable with things like
\def~{\if~}~
but those "just" cause a stack overflow and don't form a security risk
on typical architectures (some DOS TeXs protect explicitly against
this IIRC).

-- 
David Kastrup


More information about the dev-luatex mailing list