Re: [NTG-context] Basic question on Unicode and ConTeXt

23 Jul 2005

      Christopher Creutzig wrote:
...
We already have
Iconv in ruby and can, if we know that ISO-8859-2 is a single byte
coding system, simply say
conv = Iconv.new("UTF-16", "ISO-8859-2")
255.times { |i| puts lookup[conv.iconv("%c" % i)] }
to get the whole list, assuming we've filled the lookup hash first.
Great!

Sorry for all my philosophising! I don't know ruby (yet) and I didn't
even think about this possibility. My last idea was to parse and
combine the data on http://www.unicode.org/Public/MAPPINGS/VENDORS/, 
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt and
http://partners.adobe.com/public/developer/en/opentype/aglfn13.txt,
but your idea is hundred times faster and better! Thanks a lot!
...
As you've said, I'd combine steps A2 and A3, to make ConTeXt run faster.
That's OK for me. If there's a simple internal ruby tool (called every
time when unicode->tex mapping changes or some more encoding support
is added) instead of one-time-script, there should be no problem to do
that directly.
...
If you want, for whatever reason, to use \textellipsis for an
ellipsis (it just looks horribly wrong to me) instead of \dots, you'd
need to invoke the ruby script which generates the regi-* files.
I just wanted to give an example that changes are sometimes needed and
that it is difficult to trace all the places where they should have
been made. Sorry, this example wasn't very ilustrative, I don't even
know what \textellipses stands for, I just saw some comments about
changes made in regi-* files or some discrepancies.
...
The whole thing should not require any change at all to ConTeXt
itself, since the regi-* files could look exactly as they do now, just
being generated automatically.  (For the multibyte encodings, the whole
thing gets much more tricky.)
I noticed (perhaps I'm wrong) that TeX community support for cyrillic
may be better than that in unicode and in the available old 8bit
encodings. ConTeXt is also already supporting those strange regimes
(ctt, dbk, mls, mnk, mos, ncc, ...) that I was unable to find anywhere
else. In this case one should also be careful in order not to spoil
this already available feature.

I'm still slighlty confused by the encoding files (texnansi, ec,...,
in one case iso-8859-7 is used). Does it mean that it is impossible
(or at least very complex or slow) to access more than 256 characters
from a single font at once?

Mojca

Re: [NTG-context] Basic question on Unicode and ConTeXt

Mojca Miklavec