[NTG-context] unicode and out-of-box usability

Hans Hagen ntg-context@ntg.nl
Sat, 03 Jan 2004 23:38:02 +0100

At 18:59 02/01/2004, you wrote:

>I've been struggling through, trying to learn Unicode in ConTeXt. It's
>been instructive, at least. (Hope to make a MyWay about it...)


>There are a few weird things that made it difficult to learn, and I was
>wondering if someone could help explain why things are the way they are.
>In unic-ini:
>\chardef\utfunihashmode=0 % 1 = enabled
>Actually, if I understand things correctly, '1' means "disabled", which
>is what I preferred, having not yet created any unicode vectors. So the
>internal documentation there seems wrong, and I would argue the default
>case (0) makes it harder for beginners.

hm, did you look at the unic-001 etc files? the trick is in fast and efficient
expansion without the need to define lots of named glyphs

>More confusingly, in font-uni:

forget about that one, although it's called unicode, it's actually a 
mechanism for
the many vectors derived from unicode / related to unicode but not entirely 
i.e. cjk fonts

>   {\definefontsynonym[\s!Unicode][\getvalue{\??uc#1\c!file}]%
>    \def\unicodescale             {\getvalue{\??uc#1\c!schaal}}%
>    \def\unicodeheight            {\getvalue{\??uc#1\c!hoogte}}%
>    \def\unicodedepth             {\getvalue{\??uc#1\c!diepte}}%
>    \def\unicodedigits            {\getvalue{\??uc#1\c!conversie}}%
>    \def\handleunicodeglyph       {\getvalue{\??uc#1\c!commando}}%
>%%%%%%%%%%% NEXT LINE
>    \enableregime[unicode]% the following \relax's are realy needed
>    \doifvalue{\??uc#1\c!interlinie}\v!ja\setupinterlinespace\relax
>    \getvalue{\??uc#1\c!commandos}\relax}
>The \enableregime[unicode] runs in direct opposition with the
>\enableregime[utf] that normally goes at the start of (some of my)
>documents. As it stands, with the regime hard-coded, users have to put an
>\enableregime[utf] *after* the font declaration. That's awkward.

so, don't use that mechanism, stick to the utf mechanism

>The last proposed change/complaint is back in unic-ini, and came from my
>attempts to match the main body font with the unicode font.
>   {\xdef\unidiv{\number\utfdiv{#1}}%
>    \xdef\unimod{\number\utfmod{#1}}%
>    \ifnum#1<\utf@i
>%%%% \unicodeasciicharacter\unimod
>      \char\unimod % \unicodeascii\unimod
>    \else\ifcsname\@@univector\unidiv\endcsname
>      \csname\doutfunihash{\unidiv}{#1}\endcsname
>    \else % so, these can be different fonts !
>      \unicodeglyph\unidiv\unimod % no \uchar (yet)
>    \fi\fi}
>Basically, I'd like to use the \unicodeasciicharacter hook with this
>(I'm not certain the above is release-quality code, but I've been testing
>it with a stripped down \utfunifontglyph that should be functionally

play with it and we'll see

>Working with the unicode code makes me appreciate that it's really
>powerful part of ConTeXt. Thanks, Hans!

how about the following:

there are many font encodings around but none is really complete enough to 
deal with basic unicode (0/1/2 range)

why not define a new font encoding with characters only so that we can have 
as many chars as needed in a 0-255 vector, all those
special characters (registered, and so) are (1) used seldom, (2) not 
related to hyphenation and kerning; it is also a way to get
rid of some 'ligatures' like --- becoming an emdash (in context and xml we 
can conformtably directly call symbols, and these may
come from a different instance of the font