On Sat, Apr 03, 2021 at 06:02:10PM +0200, Hans Hagen wrote:
german is just an example, dutch has some specific things, and i bet other languages have their demands so my aim is some general mechanism
I appreciate that, but if you want to have data of sufficiently good quality to use this mechanism for individual languages, you need to invest a *lot* of time for each one of them. German is one of the very few languages I know of that has an active group of people working to produce that data, the “Trennmuster people”, as Mojca calls them ;-) Their word list supports many fine points of typography, even those that few programs can use, for example weighted hyphenation. Ligature prevention came in as a side project. Dutch, by contrast, does not seem so well served: the OpenTaal group is dormant and no longer offers the hyphenated word list that was once available (that was already the case five years ago). The most relevant page I find: https://www.opentaal.org/projecten/woordafbreking is from 2009. There have apparently been recent updates by a single person (who incidentally sometimes contributes to the German hyphenation working group), but they’re rather generic. Best, Arthur