# [Dev-luatex] unicode, i/o and math

Taco Hoekwater taco at elvenkind.com
Tue May 30 19:37:36 CEST 2006

Hi,

I have not added anything interesting to the luatex repository
today, I am working on unicode support offline but will not do
any commits until I can make it stop segfaulting (later this week,
hopefully).

You 'll be happy to know that I have created a brand new executable:

luatangle

This altered version of (o)tangle writes string-ids starting from
twomillionninetyseventhousandonehundredandfiftytwo (2^21).

It follows that the legal range of \char arguments will become
0..(2^21-1), thus allowing for all of Unicode and some spare
space (2^21-1 == 0x1FFFFF; unicode only goes up to 0x10FFFF, so
there will be room for almost a million non-unicode characters).

As far as the core TeX-like program is concerned, all file i/o will
be done in utf-8 (with one small exception: printing of the low
control bytes to the terminal will use ^^X notation for dangerous
byte values like ^^D and ^^Z. The internal printing code for log
and terminal will be split to allow this).

User-level configuration of i/o re-encoding will nevertheless still
be possible, through lua callbacks that can hook directly into the
i/o subroutines. No more translate-files, no more enctex, no more
8bit switches.

This is all rather straightforward, because \char, \catcode, \lccode
\uccode, etc. are all 'singular values'.

But the math commands need attention as well. Ofte, these are a
single number representing a combination of <type>+<family>+
<charnum>, and with 21bits for the  <charnum> part alone, this syntax
is problematic.

I propose to keep the old ones (with their old, limited syntax), but
also create new primitives with names like \luamathchar etc.

These will read _separate_ integers, so it will look like (e.g.)

\luamathchardef\dagger = 2 0 "2020 % unicode: 2020 DAGGER
\def\langle{\luadelimiter 5 2 "69 3 "0B }

does that make sense to you? (comments are very welcome)

Greetings,
Taco