On 11/4/06, Philipp Reichmuth wrote:
I've been starting to reuse some of this work in a script to do active character assignment for XeTeX depending on what glyphs are present in an OpenType font, so that those characters for which the font doesn't have a glyph are generated by ConTeXt. Basically I want to produce something like this:
\ifnum\XeTeXcharglyph"010D=0 \catcode`č=\active \def č{\ccaron} \else \catcode`č=\letter \fi % ConTeXt knows this letter -> better hyphenation
\ifnum\XeTeXcharglyph"1E0D=0 \catcode`ḍ=\active \def ḍ{\b{d}} \else \catcode`ḍ=\letter \fi % ConTeXt doesn't know this letter
No reason for not adding it.
(with \other, respectively, for non-letters). Being somewhat of a novice to TeX programming, I'm not sure if this will work, though, and I'm also not sure if it's better to generate static scripts that do this for every font (so the resulting TeX file is a font-specific big list of \catcode`$CHARACTERs) or to do this dynamically on every font change, maybe limited to selectable Unicode ranges (which is more general but also a lot slower).
Generating this for every single font would be stupid. This should be part of low-level XeTeX (Jonathan has promised to look into it some time). In my opinion the best way to deal with it would be the ability to define a fall-back definition for "every" missing letter in a font. Consequently, if you have "ddotbelow" missing in your font, XeTeX would ask ConTeXt if some fallback definition has been provided for that glyph, If yes, it would fall back to it, "\b{d}", but if the glyph would be present in that font, XeTeX would use it.
I'd prefer to see a context encoding added to GNU recode for the benefit of future archeologists trying to decipher ancient documents.
That would be better I guess, but isn't ConTeXt encoding a moving target in that characters can still get added? Or is the list fixed to AGL glyph names and nothing else?
No, it's certainly not fixed to AGL. But I wouldn't object adding it to GNU recode (on top of "(La)TeX" which also recognizes \v, \b, ...) if someone would decide to make a good revision of it and if more people think that it would be useful (and if developers are open to that idea). I try to use Unicode when writing sources whenever possible. Mojca PS for Philipp: I didn't try out your definitions, but you have a cut out of an older conversation as an example of what certainly doesn't work under XeTeX ;) (answer was written by Jonathan Kew) I was trying write a few macros to support the old tfm-based fonts, but figured out that that was the wrong starting point (and also other reason than yours).
\catcode`ð=\active \defð{^^f0} \starttext Testing ... ð \stoptext
and it seems to enter some infinite loop when ð is encountered (I can define any other letter as well, but only ^^f0 is causing problems).
No, this seems to me like it's the wrong way to define the character! And I think you would have the same problem with other letters if trying to define them as their own codes; the ones that work for you must be getting defined as *different* codes from the original input. The ^^xx notation is converted to a literal character by TeX's input scanning routine, so it behaves exactly as if it were that character itself. And ^^f0 in Latin-1 (or Unicode) is the ð character. So this definition works exactly the same as if you were to say \catcode`ð=\active \defð{ð} which is clearly recursive. Given that you don't need to remap ð in the input to some other Unicode character for printing, there should be no need for this at all. The only reason to use a definition like this would be if the input text used a *different* character where you want to print eth; or you want to print something *other* than character F0 for the input ð. In general, a "safe" form of the definition would be to use \chardef: \catcode`ð=\active \chardefð="F0 This makes ð into a macro that expands to the character "F0; there is an important difference between this and ^^f0, which actually "becomes" the character ð itself as the input is read (and therefore inherits its catcode, definition, etc).