schrieb Khaled Hosny:
On Wed, Sep 21, 2011 at 01:00:08AM +0200, Stephan Hennig wrote:
The same pattern approach can be used to handle non-standard hyphenation, ligaturing, round-s recognition, etc. (I think there are use-cases in Arabic script as well.)
I've to admit this is all Greek to me :) (the only language written in Arabic script that accepts hyphenation is Uyghur, and that is for the new, Chinese-imposed orthography).
I didn't mean hyphenation in Arabic script. In Latin black letter script there are two different glyphs for the small letter s -- the long ſ and the usual round s. Most keyboards don't provide a key for the ſ, so source documents usually make use of the round s only. To put an ſ at the respective places in the typeset document, the traditional way is to mark-up those places with s+ or s: (different black letter fonts and support packages use different conventions). Now, automatically applying glyph substitution at the correct places within a character stream /without/ mark-up is pretty much the same problem as finding hyphenation positions within a character stream without mark-up. The same holds for applying non-standard hyphenation without mark-up or applying ligatures at the correct places only without mark-up. Currently, TeX inserts ligatures based on a greedy rule, which produces many false positives in languages with compound words. For Arabic script, I referred to glyph substitution. But I don't know enough about Arabic script to explain further. :) Best regards, Stephan Hennig