Taco Hoekwater
Arthur Reutenauer wrote:
Instead, LuaTeX barfs on "\^^9d" and similar ASCII _transliterations_ of characters which happen to be legal _characters_ in Unicode (though not legal _bytes_ in utf-8).
Good spot, I already noticed there was many problems with latex but I thought it was mainly due to pattern files (and I gave up very early on LaTeX in LuaTeX anyway). I suppose the ^^ notation should yield a UTF-8 encoded sequence and not an individual byte (XeTeX indeed is perfectly happy with it).
It worked before, so I probably messed up something along the way. It is safe to assume there will be a fix in the next snapshot.
Anyway: I think it is a safe assumption that LuaTeX should be able to deal with current versions of LaTeX (I think it would be a mistake to have to rely on lambda). So the kind of utf-8 support (OTP or something) used for Omega needs to be somewhat optional. I don't have any clue about the current implementation, but the amount of error messages I got suggests there are several areas involved. Here is my take on what would constitute a sane environment (some of that probably is already implemented in XeTeX) in my opinion: Single characters: encoded in unicode (UCS-21 or similar). Input line buffer: array of single characters. Characters are created from input by using the input coding system of the file (basically one of 8-bit, utf-8, at some later point of time possibly also things like utf-16-le or utf-16-be). LaTeX would be fixed to "transparent" at first. Which would make it work like before. However, one would want to eventually add something like an utf8l input encoding in order to have it behave more sanely. String space: utf-8 encoded. This is probably incompatible with previous code, but saves space. Log and console output: switchable utf-8 or 8-bit, probably depending on locale and/or inherited from the mode of the current input file. In "8-bit" mode, obviously all characters with a code point above 256 need to get output as ^^^^abcd or ^^^^^^01abcd or similar. Write streams: similar. It might be possible to generally write utf-8, but then it might be a good idea to add a byte order mark at the start of files so that \input on such files will flip the coding system appropriately. I really need to take a look at XeTeX. -- David Kastrup