David Kastrup wrote:
Taco Hoekwater
writes: David Kastrup wrote:
Here at the DANTE conference I just learnt that Werner Lemberg is creating a large corpus of two separate "all hyphenations" and "main hyphenations" lists (about 400000 words IIRC) for German. So indeed it would appear that if LuaTeX offered hyphenation according to prioritized patterns, the data to make it typeset better documents in German would be reasonably well available. If there are two 'hyphenation levels', wouldn't it be easier if luatex supported running through two (or even more) separate pattern sets, and added the 'hitcount' to the discretionary?
Easier on what account?
Extending discretionary nodes is easier for me than extending the internal pattern data structure, I could program the whole multiple pass approach in only a few days. It is also easier in the sense that it can use existing patterns, no need to mess with patgen output, and no need for extensive testing of the postprocessed output. But if you can create these extended patterns, I'll wait for that.
Disadvantage: wastes a few CPU cycles because of multiple passes.
Well, hyphenation is not the fastest operation in the world.
It maxes out at about 10% runtime in a plain tex latin text document with all bells and whistles like protruding and hz turned off, a sane text with (about two-thirds of plain's default) while generating DVI. It is usually lower in other formats because of more work done by macros, or special features like HZ, or PDF output, or RL text, or math, or use of Opentype fonts. Hyphenation time is not negligible, and everything that slows the engine down warrants some discussion. But e.g. the ConTeXT mkiv code spends less than 1% of its runtime hyphenating. So we are not talking landslides either.
Doubling its runtime when one could instead add what amounts to an attribute to the final chosen point seems a bit pointless.
It will be a bit slower, but I doubt runtime will actually double. Some reasons why it will not be twice as bad: * this approach will slow down hyphenation a bit for languages that do not have these extended patterns * enlarging the pattern object data has a speed penalty also * there is more (programming as well as runtime) work needed to get the 'right' penalty than in the multiple pass case * discretionary nodes have to be enlarged because in this case you have to store actual penalties instead of hitcounts, otherwise there can be external changes to the penalty values It all depends on how hard it is to create these special patterns. Can you do that easily, or would it be a lot of work? Best wishes, Taco