Hi guys, I do not have enough time right now to write a long reply, but I wanted to mention quickly that we should not confuse character nodes and tokens|strings|csnames. Character nodes signify stuff that will end up typeset, and the actual encoding does not matter anymore, so it makes little sense to use utf-8. (besides, you need to reserve 4bytes for utf-8 anyway if you want to stick with a one-character-per-node scheme). Why not just extend the char_node? I would propose something along these lines: @d char_node=25 {that's type(), guessing the number} @d char_node_size=4 @d is_char_node(#) == (type(#)=char_node) @d character(#) == mem[#+1].rh {the glyph code in a |char_node|} @d font(#) == mem[#+1].lh {the font code in a |char_node|} @d language==subtype {the language for this |char_node|} @d unicode(#) == mem[#+2].int {a unicode code point } @d color(#) == mem[#+3].rh {mystery pointer} @d special(#) == mem[#+3].lh {another mystery pointer} This is just a rough notion, not a definitive proposal! Greetings, Taco