On Wed, Sep 21, 2011 at 01:00:08AM +0200, Stephan Hennig wrote:
schrieb Taco Hoekwater:
On 09/15/2011 10:56 PM, Khaled Hosny wrote:
I just came across this library: http://sourceforge.net/projects/hunspell/files/Hyphen/
Hyphenation is luatex is in fact an adaptation of a (slightly earlier) version of libhnj. At that time, it did not do compound word stuff yet, so I have to check that out. It did then already have non-standard hyphenation.
However, that was implemented as such an hack that I decided to leave it out in the new luatex code, and instead opted for non-standard hyphenation in the exceptions instead of in the patterns proper. (what libhnj did at that time was disguising dictionary exceptions as patterns, so the non-standard hyphenation 'pattern rules' were in fact complete words with a single non-standard hyphenation in it somewhere.)
As Taco already pointed out libhnj mixes-up regular patterns and non-standard hyphenation patterns. I sent a proposal about compound word hyphenation to Taco a while ago that clearly separates patterns with different semantics.
In this context, different semantics means different hyphenation penalties. That is, provide different sets of hyphenation patterns for all needed hyphenation penalties, i.e, patterns
* for compound word hyphenation, * for prefix and suffix hyphenation, * for suppressing aesthetically unpleasant hyphenations, * etc.
For the German language, I think even more than five different penalty classes could be desirable. All these sets of patterns can be applied to a word in parallel and the penalties are chosen according to which pattern set matches a spot.
The same pattern approach can be used to handle non-standard hyphenation, ligaturing, round-s recognition, etc. (I think there are use-cases in Arabic script as well.)
I don't know what Taco's current plan is, though. The corresponding tracker item reads "multi-pass hyphenation", URL:http://tracker.luatex.org/view.php?id=168, whereas my proposal is about applying patterns in parallel rather than in multiple passes.
I've to admit this is all Greek to me :) (the only language written in Arabic script that accepts hyphenation is Uyghur, and that is for the new, Chinese-imposed orthography). I just thought re-using existing code and patterns might be of interest, but I can't judge the quality of either of them. Regards, Khaled -- Khaled Hosny Egyptian Arab