[NTG-context] unic-xxx.tex glyph lists: minor bugs, questions

Philipp Reichmuth reichmuth at web.de
Sun Nov 5 02:24:54 CET 2006


I've been writing a script that sifts through the unic-xxx.tex files to 
get a readable mapping what Unicode characters are supported using 
\Amacron-style names.

In the process I found one bug and something that might be another bug:

- the Cyrillic block (unic-004.tex) is missing an \unknownchar line for 
U+04CF, so that the remaining (few) glyphs are off by one

- the Hebrew block (unic-005.tex) starts with a \numexpr line indicating 
an offset of 224 = E0; however, the first character in the list is 
U+05D0.  So either the whole block is off by 16, starting at 0x0490 
instead of 0x0500, or the 224 should be a 208 (=D0) instead.  BTW 
unic-005.tex is the only file with Macintosh line endings. Are the 
unic-xxx files automatically generated or maintained by hand?

Incidentally, it would be trivial now to put the list of ConTeXt glyphs 
on the Wiki, if anyone's interested.

I wanted to use this to work towards better support for the whole range 
of ConTeXt glyphs with OpenType fonts under XeTeX, by reading what 
ConTeXt glyphs are available in a font and building a list of 
"\catcode`ā=\active \def ā {\amacron}"-style list for the rest. 
(Unfortunately this kind of list would be font-specific, but the generic 
alternative would be a huge list of active characters with an 
\ifnum\XeTeXcharglyph"....>0 macro behind it, and that would probable be 
quite slow.)  I wonder if there is a more intelligent way to achieve 
this goal; since part of the logic for mapping code points into glyph 
macros exists already, it would be easier if there was a way to reuse that.

The best way out would be if I could enable ConTeXt's UTF-8 regime while 
running XeTeX in \XeTeXinputencoding=bytes mode, but I haven't gotten 
that to work yet.


More information about the ntg-context mailing list