Re: [NTG-context] unicode and out-of-box usability

3 Jan 2004


      At 18:59 02/01/2004, you wrote:
...
I've been struggling through, trying to learn Unicode in ConTeXt. It's
been instructive, at least. (Hope to make a MyWay about it...)
good
...
There are a few weird things that made it difficult to learn, and I was
wondering if someone could help explain why things are the way they are.
In unic-ini:
\chardef\utfunihashmode=0 % 1 = enabled
Actually, if I understand things correctly, '1' means "disabled", which
is what I preferred, having not yet created any unicode vectors. So the
internal documentation there seems wrong, and I would argue the default
case (0) makes it harder for beginners.
hm, did you look at the unic-001 etc files? the trick is in fast and efficient
expansion without the need to define lots of named glyphs
...
More confusingly, in font-uni:
forget about that one, although it's called unicode, it's actually a 
mechanism for
the many vectors derived from unicode / related to unicode but not entirely 
i.e. cjk fonts
...
\def\enableunicodefont#1%
  {\definefontsynonym[\s!Unicode][\getvalue{\??uc#1\c!file}]%
   \def\unicodescale             {\getvalue{\??uc#1\c!schaal}}%
   \def\unicodeheight            {\getvalue{\??uc#1\c!hoogte}}%
   \def\unicodedepth             {\getvalue{\??uc#1\c!diepte}}%
   \def\unicodedigits            {\getvalue{\??uc#1\c!conversie}}%
   \def\handleunicodeglyph       {\getvalue{\??uc#1\c!commando}}%
%%%%%%%%%%% NEXT LINE
   \enableregime[unicode]% the following \relax's are realy needed
   \doifvalue{\??uc#1\c!interlinie}\v!ja\setupinterlinespace\relax
   \getvalue{\??uc#1\c!commandos}\relax}
The \enableregime[unicode] runs in direct opposition with the
\enableregime[utf] that normally goes at the start of (some of my)
documents. As it stands, with the regime hard-coded, users have to put an
\enableregime[utf] *after* the font declaration. That's awkward.
so, don't use that mechanism, stick to the utf mechanism
...
The last proposed change/complaint is back in unic-ini, and came from my
attempts to match the main body font with the unicode font.
\def\utfunifontglyph#1%
  {\xdef\unidiv{\number\utfdiv{#1}}%
   \xdef\unimod{\number\utfmod{#1}}%
   \ifnum#1<\utf@i
%%%% \unicodeasciicharacter\unimod
     \char\unimod % \unicodeascii\unimod
   \else\ifcsname\@@univector\unidiv\endcsname
     \csname\doutfunihash{\unidiv}{#1}\endcsname
   \else % so, these can be different fonts !
     \unicodeglyph\unidiv\unimod % no \uchar (yet)
   \fi\fi}
Basically, I'd like to use the \unicodeasciicharacter hook with this
definition:
\def\unicodeasciicharacter{\uchar{0}}
(I'm not certain the above is release-quality code, but I've been testing
it with a stripped down \utfunifontglyph that should be functionally
equivalent.)
play with it and we'll see
...
Working with the unicode code makes me appreciate that it's really
powerful part of ConTeXt. Thanks, Hans!
how about the following:

there are many font encodings around but none is really complete enough to 
deal with basic unicode (0/1/2 range)

why not define a new font encoding with characters only so that we can have 
as many chars as needed in a 0-255 vector, all those
special characters (registered, and so) are (1) used seldom, (2) not 
related to hyphenation and kerning; it is also a way to get
rid of some 'ligatures' like --- becoming an emdash (in context and xml we 
can conformtably directly call symbols, and these may
come from a different instance of the font

Hans

Re: [NTG-context] unicode and out-of-box usability

Hans Hagen