Re: [NTG-context] Math encoding in XeTeX

22 May 2006

      On 5/21/06, Hans Hagen wrote:
...
Mojca Miklavec wrote:
...
i think that using \enableregime[utf] should work ok xetex (but in loading patterns)
You can play with
\startregime[none]
  \dostepwiserecurse{128}{255}{1}
    {\expanded{\defineactivecharacter
       {\recurselevel}
       {\rawcharacter{\recurselevel}}}}
\stopregime
\enableregime[none]
\appendtoks
  \enableregime[utf]%
%   \everyhbox\expandafter{\the\everyhbox\enableregime[none]}% fails
\to \everymathematics
\starttext
text : Ã§ ÃŸ
math : $Ã§ ÃŸ [\text{\bf\enableregime[none]Ã§ ÃŸ}] Ã§ ÃŸ$
\stoptext
Taco may know why active chars behave a bit strange in math (may hav eto do with the multiple passes in math and info getting lost
I'm only speculating here because I don't have my computer & linux
here, but may it be that it's because XeTeX handles characters in a
completely different way than pdfTeX? If you define 128 active
characters, the thing will behave pseudo-randomly.

Active characters have to be defined in the "unicode-way", not by
reimplementing Unicode inside ConTeXt macros (which is what ConTeXt
currently does for pdfTeX). I guess that the unicode vectors should be
redefined (or at least generalized) for the purposes of XeTeX.

I would suggest to create active characters for math and the "extended
latin" sections of Unicode, so that "faking" characters and using
ec-encoded fonts would still work (although XeTeX should also support
EC encoding in my opinion, which it currently doesn't). You can then
leave the rest of Unicode to be handled by XeTeX.

Explanation: if I use ec-encoded Type1 fonts, \ccaron will work, but
typing "č" directly won't. ConTeXt knows where \ccaron is located in
an EC font and is able to use it properly, but XeTeX doesn't know
where to look for the glyph "whatever the unicode number of ccaron
is". The only way out of it is to say
    \catcode`č=\active \defč{\ccaron}
So my suggestion would be to define active characters for those
Unicode letters where it makes sense to do it (accented latin
characters, math) and where they can be faked.

There was already a discussion about a similar topic some time ago
(but it might take too much processing time):
   if glyph exists in the font ( \ifnum\XeTeXcharglyph"xxxx>0 )
      use it
   else
      fake it

\enableregime should also be modified in order to handle regimes other
that UTF properly in XeTeX. There are two possibilities: either to let
XeTeX do it his own way or to read everything as "byte" and let
ConTeXt handle regimes in its own way.
If XeTeX does the job, EC-encoded fonts won't work. If you let ConTeXt
process it, you might loose support for some more obscure regimes that
XeTeX can already handle.

Mojca