At 18:59 02/01/2004, you wrote:
I've been struggling through, trying to learn Unicode in ConTeXt. It's been instructive, at least. (Hope to make a MyWay about it...)
good
There are a few weird things that made it difficult to learn, and I was wondering if someone could help explain why things are the way they are.
In unic-ini: \chardef\utfunihashmode=0 % 1 = enabled
Actually, if I understand things correctly, '1' means "disabled", which is what I preferred, having not yet created any unicode vectors. So the internal documentation there seems wrong, and I would argue the default case (0) makes it harder for beginners.
hm, did you look at the unic-001 etc files? the trick is in fast and efficient expansion without the need to define lots of named glyphs
More confusingly, in font-uni:
forget about that one, although it's called unicode, it's actually a mechanism for the many vectors derived from unicode / related to unicode but not entirely i.e. cjk fonts
\def\enableunicodefont#1% {\definefontsynonym[\s!Unicode][\getvalue{\??uc#1\c!file}]% \def\unicodescale {\getvalue{\??uc#1\c!schaal}}% \def\unicodeheight {\getvalue{\??uc#1\c!hoogte}}% \def\unicodedepth {\getvalue{\??uc#1\c!diepte}}% \def\unicodedigits {\getvalue{\??uc#1\c!conversie}}% \def\handleunicodeglyph {\getvalue{\??uc#1\c!commando}}% %%%%%%%%%%% NEXT LINE \enableregime[unicode]% the following \relax's are realy needed \doifvalue{\??uc#1\c!interlinie}\v!ja\setupinterlinespace\relax \getvalue{\??uc#1\c!commandos}\relax}
The \enableregime[unicode] runs in direct opposition with the \enableregime[utf] that normally goes at the start of (some of my) documents. As it stands, with the regime hard-coded, users have to put an \enableregime[utf] *after* the font declaration. That's awkward.
so, don't use that mechanism, stick to the utf mechanism
The last proposed change/complaint is back in unic-ini, and came from my attempts to match the main body font with the unicode font.
\def\utfunifontglyph#1% {\xdef\unidiv{\number\utfdiv{#1}}% \xdef\unimod{\number\utfmod{#1}}% \ifnum#1<\utf@i %%%% \unicodeasciicharacter\unimod \char\unimod % \unicodeascii\unimod \else\ifcsname\@@univector\unidiv\endcsname \csname\doutfunihash{\unidiv}{#1}\endcsname \else % so, these can be different fonts ! \unicodeglyph\unidiv\unimod % no \uchar (yet) \fi\fi}
Basically, I'd like to use the \unicodeasciicharacter hook with this definition:
\def\unicodeasciicharacter{\uchar{0}}
(I'm not certain the above is release-quality code, but I've been testing it with a stripped down \utfunifontglyph that should be functionally equivalent.)
play with it and we'll see
Working with the unicode code makes me appreciate that it's really powerful part of ConTeXt. Thanks, Hans!
how about the following: there are many font encodings around but none is really complete enough to deal with basic unicode (0/1/2 range) why not define a new font encoding with characters only so that we can have as many chars as needed in a 0-255 vector, all those special characters (registered, and so) are (1) used seldom, (2) not related to hyphenation and kerning; it is also a way to get rid of some 'ligatures' like --- becoming an emdash (in context and xml we can conformtably directly call symbols, and these may come from a different instance of the font Hans