khaledhosny at eglug.org
Wed Sep 21 11:25:31 CEST 2011
On Wed, Sep 21, 2011 at 01:00:08AM +0200, Stephan Hennig wrote:
> schrieb Taco Hoekwater:
> > On 09/15/2011 10:56 PM, Khaled Hosny wrote:
> >> I just came across this library:
> >> http://sourceforge.net/projects/hunspell/files/Hyphen/
> > Hyphenation is luatex is in fact an adaptation of a (slightly
> > earlier) version of libhnj. At that time, it did not do compound word
> > stuff yet, so I have to check that out. It did then already have
> > non-standard hyphenation.
> > However, that was implemented as such an hack that I decided to leave
> > it out in the new luatex code, and instead opted for non-standard
> > hyphenation in the exceptions instead of in the patterns proper.
> > (what libhnj did at that time was disguising dictionary exceptions
> > as patterns, so the non-standard hyphenation 'pattern rules' were in
> > fact complete words with a single non-standard hyphenation in it
> > somewhere.)
> As Taco already pointed out libhnj mixes-up regular patterns and
> non-standard hyphenation patterns. I sent a proposal about compound
> word hyphenation to Taco a while ago that clearly separates patterns
> with different semantics.
> In this context, different semantics means different hyphenation
> penalties. That is, provide different sets of hyphenation patterns for
> all needed hyphenation penalties, i.e, patterns
> * for compound word hyphenation,
> * for prefix and suffix hyphenation,
> * for suppressing aesthetically unpleasant hyphenations,
> * etc.
> For the German language, I think even more than five different penalty
> classes could be desirable. All these sets of patterns can be applied
> to a word in parallel and the penalties are chosen according to which
> pattern set matches a spot.
> The same pattern approach can be used to handle non-standard
> hyphenation, ligaturing, round-s recognition, etc. (I think there are
> use-cases in Arabic script as well.)
> I don't know what Taco's current plan is, though. The corresponding
> tracker item reads "multi-pass hyphenation",
> <URL:http://tracker.luatex.org/view.php?id=168>, whereas my proposal is
> about applying patterns in parallel rather than in multiple passes.
I've to admit this is all Greek to me :) (the only language written in
Arabic script that accepts hyphenation is Uyghur, and that is for the
new, Chinese-imposed orthography). I just thought re-using existing code
and patterns might be of interest, but I can't judge the quality of
either of them.
More information about the dev-luatex