[Dev-luatex] Input and output character sets
taco at elvenkind.com
Thu Aug 17 19:56:33 CEST 2006
Idris Samawi Hamid wrote:
> I'll put something together using the gamma module (see m-gamma.tex and
> type-omg.tex) sometime today. See also the off-list mail I sent you.
Meanwhile torture.tex has proved to be quite useful. It runs almost
OK now (one issue remains afaics). It should be possible to fix that
tomorrow, so that at least one non-trivial document is typeset
In case anyone is interested: the biggest problems are caused by the
move from the two separate homogenenous 'file i/o' models that are
used by pdfTeX and Aleph (resp. bytes and 16-bit shorts) to the
variable encoding (utf-8) that is used by LuaTeX internally.
For example, when TeX is searching for a control sequence name, it does
a sneak past the end of the name, and then it jumps back one item to
find the actual last character of the name. This does not work in UTF-8,
because if the last character was > 128, it has to back up two, three,
or even four items.
> This is interesting; it allows one to globally specify the input encoding
> prior/independent of the ocp-list. I've always just done that inside the
> ocp list itself in my own work (but then again I use multiple inputs
I've looked at this quite intensively over the past two days, and I
propose to drop this entire feature. It seems not to be heavily used,
the used names are 'abnormal' for TeX (e.g. \noDefaultInputMode is an
actual primitive), the feature is very likely to clash with, as well
as complicate, future callbacks to/from Lua scripting, and finally
both the interface and the implementation appear to be a badly rushed
job or an experiment only.
A fresh implementation of file encoding support using Lua makes more
sense to me (and will probably take less time to do than fixing the
More information about the dev-luatex