New subject: Ligature suppression word list

7 Apr 2021


      
...
Message: 2
Date: Tue, 6 Apr 2021 15:03:54 +0000
From: mailto:denis.maier@ub.unibe.ch>
To: mailto:j.hagen@xs4all.nl>, mailto:ntg-context@ntg.nl>
Subject: Re: [NTG-context] Ligature suppression word list
Message-ID: <41e6530172b54bffb7a82febff0a6be5@ub.unibe.ch mailto:41e6530172b54bffb7a82febff0a6be5@ub.unibe.ch>
Content-Type: text/plain; charset="iso-8859-1"
...
-----Ursprüngliche Nachricht-----
Von: Hans Hagen mailto:j.hagen@xs4all.nl>
Gesendet: Samstag, 3. April 2021 17:58
An: mailing list for ConTeXt users mailto:ntg-context@ntg.nl>; Maier, Denis
Christian (UB) mailto:denis.maier@ub.unibe.ch>
Betreff: Re: [NTG-context] Ligature suppression word list
[…]
...
...
...
2. A bigger solution might be to use selnoligs patterns in a script
   that can be run over a large corpus, such as the DWDS (Digitales
   Wörterbuch der deutschen Sprache). That should produce us a more
   complete list of words where ligatures must be suppressed.
where is that DWDS ... i can write some code to deal with it (i'd rather start
from the source than from some interpretation; who know what more there
is to uncover)
As it turn out, the linguists that helped with the selnolig package did use another corpus: Stuttgart "Deutsch" Web as Corpus
They describe their approach in that paper: https://raw.githubusercontent.com/SHildebrandt/selnolig-check/master/selnoli... https://raw.githubusercontent.com/SHildebrandt/selnolig-check/master/selnoli...
A lot of  corpora can be found here: https://wortschatz.uni-leipzig.de/de https://wortschatz.uni-leipzig.de/de
especially here: https://wortschatz.uni-leipzig.de/de/download/German https://wortschatz.uni-leipzig.de/de/download/German

There are corpora of many other languages, too, such as English, French, Dutch, Spanish, Russian, Japanese, Latin, …

HTH

Ralf

Re: [NTG-context] Ligature suppression word list

rha17＠t-online.de

denis.maier＠ub.unibe.ch

denis.maier＠ub.unibe.ch

tags

participants (2)