Hi, please forget what I said with utf8 -- it just came cross my mind that it might be possible to support unicode using 'extended attributes' of a char node but I didn't give it much consideration. My main intention is to support more attributes for a char node than just font. I proposed using the font field as a pointer to a set of character attributes because I think it requires less change -- the memory layout can stay unchanged, all tricks to deal with character node can be left untouched. But I am not that sure -- it may be better to redefined char node layout like Taco proposed. Thanh On Mon, Jul 18, 2005 at 03:08:29PM +0200, Taco Hoekwater wrote:
Hi guys,
I do not have enough time right now to write a long reply, but I wanted to mention quickly that we should not confuse character nodes and tokens|strings|csnames.
Character nodes signify stuff that will end up typeset, and the actual encoding does not matter anymore, so it makes little sense to use utf-8. (besides, you need to reserve 4bytes for utf-8 anyway if you want to stick with a one-character-per-node scheme).
Why not just extend the char_node? I would propose something along these lines:
@d char_node=25 {that's type(), guessing the number} @d char_node_size=4 @d is_char_node(#) == (type(#)=char_node) @d character(#) == mem[#+1].rh {the glyph code in a |char_node|} @d font(#) == mem[#+1].lh {the font code in a |char_node|} @d language==subtype {the language for this |char_node|} @d unicode(#) == mem[#+2].int {a unicode code point } @d color(#) == mem[#+3].rh {mystery pointer} @d special(#) == mem[#+3].lh {another mystery pointer}
This is just a rough notion, not a definitive proposal!
Greetings, Taco