character attribute
Hi, currently a character node in tex has the only attribute, ie the associated font. I m wondering whether it makes sense to replace this attribute (font) by a pointer to some data structure representing a set of attributes. This way we can add further attributes of a character without messing up around the sources. Such attributes can be: font, color, location in tex sources, next bytes of a utf8 sequence etc. How to you guys see it? Thanh
The Thanh Han wrote:
Hi,
currently a character node in tex has the only attribute, ie the associated font. I m wondering whether it makes sense to replace this attribute (font) by a pointer to some data structure representing a set of attributes. This way we can add further attributes of a character without messing up around the sources. Such attributes can be: font, color, location in tex sources, next bytes of a utf8 sequence etc. How to you guys see it?
this is indeed a much wanted feature although i don't know what you mean with utf 8 here since at the char node that has already been reslvbed of course it needs some thinking because it should be easy to combine existing functionality with user built stuff; also, additional features would be needed in order to handle resources at page boundaries and such (comparable to marks ) ; fonts probably are hard coded features then how do you envision this? Say that \newfeature \mycolor defines an abstract feature. At a certain point tex needs to let a macro package handle the features, i.e. pass all relevant features to a handler; \handlefeature (gets called as many times as needed with \feature set) this could be something direct or delayed (it depends on how it will influence par building and page breaking etc; for instance one may now want specials/literals to end up too soon in the process; i can even imagine that all features are delayed till shipout time; we need to distinguish between begin and end situations (maybe even at the level of lines, paragraphs, pages, splits ... it's not that easy to come up with a good generic approach to some extend marks are ok to handle features, but they lack some functionality (like a proper way to reset the internal mark register); and they also can interfere [taco and i had some discussion on this but i don't think we ever wrote down something ...] Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Sun, 17 Jul 2005, Hans Hagen wrote:
The Thanh Han wrote:
currently a character node in tex has the only attribute, ie the associated font. I m wondering whether it makes sense to replace this attribute (font) by a pointer to some data structure representing a set of attributes. This way we can add further attributes of a character without messing up around the sources. Such attributes can be: font, color, location in tex sources, next bytes of a utf8 sequence etc. How to you guys see it?
this is indeed a much wanted feature although i don't know what you mean with utf 8 here since at the char node that has already been reslvbed
maybe one could keep character storage in (variable length) UTF format throughout? Actually what would happen if the control sequence names would be stored without input conversion in the pool with their original UTF format? The hash would still work with UTF codes. Problems are e. g. how to replace the eqtb area with the first 256 characters? Maybe also use there a multi-byte sequence with a "single character" attribute? <snip> Regards, Hartmut
Hartmut Henkel wrote:
On Sun, 17 Jul 2005, Hans Hagen wrote:
The Thanh Han wrote:
currently a character node in tex has the only attribute, ie the associated font. I m wondering whether it makes sense to replace this attribute (font) by a pointer to some data structure representing a set of attributes. This way we can add further attributes of a character without messing up around the sources. Such attributes can be: font, color, location in tex sources, next bytes of a utf8 sequence etc. How to you guys see it?
this is indeed a much wanted feature although i don't know what you mean with utf 8 here since at the char node that has already been reslvbed
maybe one could keep character storage in (variable length) UTF format throughout? Actually what would happen if the control sequence names would be stored without input conversion in the pool with their original UTF format? The hash would still work with UTF codes. Problems are e. g. how to replace the eqtb area with the first 256 characters? Maybe also use there a multi-byte sequence with a "single character" attribute?
but in that case we should use unicode slots a la xetex or aleph ... somethign we need to do anyway when pdftex moves open type Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hi guys, I do not have enough time right now to write a long reply, but I wanted to mention quickly that we should not confuse character nodes and tokens|strings|csnames. Character nodes signify stuff that will end up typeset, and the actual encoding does not matter anymore, so it makes little sense to use utf-8. (besides, you need to reserve 4bytes for utf-8 anyway if you want to stick with a one-character-per-node scheme). Why not just extend the char_node? I would propose something along these lines: @d char_node=25 {that's type(), guessing the number} @d char_node_size=4 @d is_char_node(#) == (type(#)=char_node) @d character(#) == mem[#+1].rh {the glyph code in a |char_node|} @d font(#) == mem[#+1].lh {the font code in a |char_node|} @d language==subtype {the language for this |char_node|} @d unicode(#) == mem[#+2].int {a unicode code point } @d color(#) == mem[#+3].rh {mystery pointer} @d special(#) == mem[#+3].lh {another mystery pointer} This is just a rough notion, not a definitive proposal! Greetings, Taco
Hi, please forget what I said with utf8 -- it just came cross my mind that it might be possible to support unicode using 'extended attributes' of a char node but I didn't give it much consideration. My main intention is to support more attributes for a char node than just font. I proposed using the font field as a pointer to a set of character attributes because I think it requires less change -- the memory layout can stay unchanged, all tricks to deal with character node can be left untouched. But I am not that sure -- it may be better to redefined char node layout like Taco proposed. Thanh On Mon, Jul 18, 2005 at 03:08:29PM +0200, Taco Hoekwater wrote:
Hi guys,
I do not have enough time right now to write a long reply, but I wanted to mention quickly that we should not confuse character nodes and tokens|strings|csnames.
Character nodes signify stuff that will end up typeset, and the actual encoding does not matter anymore, so it makes little sense to use utf-8. (besides, you need to reserve 4bytes for utf-8 anyway if you want to stick with a one-character-per-node scheme).
Why not just extend the char_node? I would propose something along these lines:
@d char_node=25 {that's type(), guessing the number} @d char_node_size=4 @d is_char_node(#) == (type(#)=char_node) @d character(#) == mem[#+1].rh {the glyph code in a |char_node|} @d font(#) == mem[#+1].lh {the font code in a |char_node|} @d language==subtype {the language for this |char_node|} @d unicode(#) == mem[#+2].int {a unicode code point } @d color(#) == mem[#+3].rh {mystery pointer} @d special(#) == mem[#+3].lh {another mystery pointer}
This is just a rough notion, not a definitive proposal!
Greetings, Taco
participants (4)
-
Hans Hagen
-
Hartmut Henkel
-
Taco Hoekwater
-
The Thanh Han