Re: [NTG-context] Greek in luatex

13 Sep 2007


      Thomas A. Schmitz wrote:
...
...
For your general problem you need to define a new regime that will
map each relevant character sequence to the corresponding Unicode
character.  That is, you inform ConTeXt that the character stream  
it sees
is actually a way of coding another set of characters and that it can
forget the original stream.  This treatment should be done before  
any sort
of font property intervenes, because it does not depend on the
appearance of the typeset text.  That's what regimes are for.
regimes are a solution, but what solution is best depends on the input 
stream ... whole document? partial document? also written to external 
files? evenually everything can become a unicode, (private aereas) and 
as such travel through the system; of we can misuse virtual fonts ...
...
...
we could plug into the input stream reading routine (just like other
regimes work).
there are mechanisms for that (because that's what i played al lot with 
last year; there was (maybe even is) a mechanism for chained processing 
of input etc
...
...
actually tell ConTeXt that you are handling Latin characters with a
special appearance (that the font takes care of), so for example, the
underlying text in a PDF would be a stream of Latin characters, and
copying-and-pasting would yield Latin characters, not Greek.
not entirely true ... we can (and do) intercept the node stream ... ok, 
at that point we're dealing with a font/char pair, but we can chang ethe 
char (or node) to whatever we like ... depends on the problem
...
The question of copy-and-paste is one of the big mysteries, and I  
have no clue why it works in some cases, but not in others. Right  
now, on my system (OS X 10.4), only Adobe Reader 8.0 does copy-paste  
correctly, and it does it correctly no matter if I use babel or  
Unicode input. Never touch a running system: I just take this as  
some  sort of divine favor and leave it at that...
that's a matter of associating tounicode points, of course, no unicode 
means no copy/paste -)
...
...
That is
not what you want here: you want your "a" to be understood as "alpha"
and your "less-than acute-sign w vertical-bar" to be considered an
"omega with dasia, varia and subscribed iota".  Nor should you  
think of
these transformations as a collection of ligatures (which act at the
font level), but rather as a text encoding, just like UTF-8 is an
encoding of the Unicode characters: in UTF-8 the byte sequence
"hexadecimal byte E1, hexadecimal byte BC, hexadecimal byte 80" is the
coding for the Unicode character U+1F00 GREEK SMALL LETTER ALPHA  
WITH PSILI,
and in the Babel input scheme for Ancient Greek the same character is
encoded with the byte sequence "hexadecimal byte 3C [ASCII '<'],
hexadecimal byte 61 [ASCII 'a']".
Yes, that's crystal clear. It would also take care of another  
problem: in the input stream, you know exactly which character  
sequence translates to what. On the font level, legacy fonts  
sometimes have their own ideas about where to put certain glyphs.
depends ... the input char becomes a node, now, if (probably controlled 
by attributes) a certain char is sees (say 'a') and you want it to be an 
alpha, well, we can change that char then in the node,
...
...
Of course in the past, these transformations were handled at the  
font
level and sequences like "< a" were actually ligatures, because  
that was
all we had (and copypasting from a PDF was, mostly, doomed to  
fail); but
we should not persist in that use now we can treat them as real  
Unicode
characters.
those hard coded mechanism were indeed not sufficient

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
      tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------