Hi, where can I find the hyphenation patterns used by ConTeXt? I have two wrongly hyphenated words, and I want to check whether this is due to incorrect patterns. (I tried the source browser... not much luck so far.) The words are: 1. applicable => hyphenated as applic-able 2. obligated => hyphenated as oblig-ated I know I can use \hyphenation to correct that, but I wanted to check the patterns nevertheless. Best, Denis
Hi, you can find patterns on this directory: texlive/2020/texmf-dist/tex/context/patterns/mkiv/ Best wishes, Tomáš Thu, Oct 08, 2020 ve 05:41:09PM +0200 Denis Maier napsal(a): # Hi, # # where can I find the hyphenation patterns used by ConTeXt? I have # two wrongly hyphenated words, and I want to check whether this is # due to incorrect patterns. (I tried the source browser... not much # luck so far.) The words are: # 1. applicable => hyphenated as applic-able # 2. obligated => hyphenated as oblig-ated # # I know I can use \hyphenation to correct that, but I wanted to check # the patterns nevertheless. # # Best, # Denis # ___________________________________________________________________________________ # If your question is of interest to others as well, please add an entry to the Wiki! # # maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context # webpage : http://www.pragma-ade.nl / http://context.aanhet.net # archive : https://bitbucket.org/phg/context-mirror/commits/ # wiki : http://contextgarden.net # ___________________________________________________________________________________ Tomáš Hála -------------------------------------------------------------------- Mendelova univerzita, Provozně ekonomická fakulta, ústav informatiky Zemědělská 1, CZ-613 00 Brno, tel. +420 545 13 22 28 -------------------------------------------------------------------- http://akela.mendelu.cz/~thala
Am 08.10.2020 um 17:41 schrieb Denis Maier
: where can I find the hyphenation patterns used by ConTeXt? I have two wrongly hyphenated words, and I want to check whether this is due to incorrect patterns. (I tried the source browser... not much luck so far.) The words are: 1. applicable => hyphenated as applic-able 2. obligated => hyphenated as oblig-ated
I know I can use \hyphenation to correct that, but I wanted to check the patterns nevertheless.
I guess it’s just a valid option. You can check possible hyphenations like this: \starttext {EN: \en\hyphenatedcoloredword{applicable}} {DE: \de\hyphenatedcoloredword{applicable}} \stoptext Hraban
Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm:
\starttext
{EN: \en\hyphenatedcoloredword{applicable}}
{DE: \de\hyphenatedcoloredword{applicable}}
\stoptext Wow, that's super helpful. The English pattern seems to be "ap-plic-a-ble" According to Meriam-Webster it should just be "ap·pli·ca·ble".
{EN: \en\hyphenatedcoloredword{obligate}} gives me "ob-lig-ate" According to Meriam-Webster it should be "ob·li·gate". I've had a look at the files mentioned by Tomáš, but as these are not just wordlists I can not really tell what is happening. So, is that a bug? Best, Denis
On 9 Oct 2020, at 08:52, Denis Maier
wrote: Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm:
\starttext
{EN: \en\hyphenatedcoloredword{applicable}}
{DE: \de\hyphenatedcoloredword{applicable}}
\stoptext
Wow, that's super helpful. The English pattern seems to be "ap-plic-a-ble" According to Meriam-Webster it should just be "ap·pli·ca·ble".
{EN: \en\hyphenatedcoloredword{obligate}} gives me "ob-lig-ate" According to Meriam-Webster it should be "ob·li·gate".
I've had a look at the files mentioned by Tomáš, but as these are not just wordlists I can not really tell what is happening.
So, is that a bug?
Not really. hyphenation patterns are a bit like applying JPEG compression to a dictionary. It makes the data size smaller by recognising patterns while ignoring outliers. Occasional errors are to be expected, which is why \hyphenation exists. Best wishes, Taco
Am 09.10.2020 um 08:57 schrieb Taco Hoekwater:
On 9 Oct 2020, at 08:52, Denis Maier
wrote: Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm:
\starttext
{EN: \en\hyphenatedcoloredword{applicable}}
{DE: \de\hyphenatedcoloredword{applicable}}
\stoptext
Wow, that's super helpful. The English pattern seems to be "ap-plic-a-ble" According to Meriam-Webster it should just be "ap·pli·ca·ble".
{EN: \en\hyphenatedcoloredword{obligate}} gives me "ob-lig-ate" According to Meriam-Webster it should be "ob·li·gate".
I've had a look at the files mentioned by Tomáš, but as these are not just wordlists I can not really tell what is happening.
So, is that a bug? Not really. hyphenation patterns are a bit like applying JPEG compression to a dictionary. It makes the data size smaller by recognising patterns while ignoring outliers.
Occasional errors are to be expected, which is why \hyphenation exists.
I see. I've noticed lang-us.lua has a list of exceptions in it: ["exceptions"]={ ["characters"]="abcdefghijlmnoprstuyz", ["data"]="as-so-ciate as-so-ciates dec-li-na-tion oblig-a-tory phil-an-thropic present presents project projects reci-procity re-cog-ni-zance ref-or-ma-tion ret-ri-bu-tion ta-ble", ["length"]=168, ["n"]=14, }, Would it be possible to add more exceptions to that list as they come up? Or is that inappropriate? Denis
On 10/9/2020 9:01 AM, Denis Maier wrote:
Am 09.10.2020 um 08:57 schrieb Taco Hoekwater:
On 9 Oct 2020, at 08:52, Denis Maier
wrote: Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm:
\starttext
{EN: \en\hyphenatedcoloredword{applicable}}
{DE: \de\hyphenatedcoloredword{applicable}}
\stoptext
Wow, that's super helpful. The English pattern seems to be "ap-plic-a-ble" According to Meriam-Webster it should just be "ap·pli·ca·ble".
{EN: \en\hyphenatedcoloredword{obligate}} gives me "ob-lig-ate" According to Meriam-Webster it should be "ob·li·gate".
I've had a look at the files mentioned by Tomáš, but as these are not just wordlists I can not really tell what is happening.
So, is that a bug? Not really. hyphenation patterns are a bit like applying JPEG compression to a dictionary. It makes the data size smaller by recognising patterns while ignoring outliers.
Occasional errors are to be expected, which is why \hyphenation exists.
I see. I've noticed lang-us.lua has a list of exceptions in it: ["exceptions"]={ ["characters"]="abcdefghijlmnoprstuyz", ["data"]="as-so-ciate as-so-ciates dec-li-na-tion oblig-a-tory phil-an-thropic present presents project projects reci-procity re-cog-ni-zance ref-or-ma-tion ret-ri-bu-tion ta-ble", ["length"]=168, ["n"]=14, },
Would it be possible to add more exceptions to that list as they come up? Or is that inappropriate? you can add your own runtime in a style:
\hyphenation {fo-ob-ar} \hsize 1mm foobar ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
Am 09.10.2020 um 14:48 schrieb Hans Hagen:
On 10/9/2020 9:01 AM, Denis Maier wrote:
[...] I see. I've noticed lang-us.lua has a list of exceptions in it: ["exceptions"]={ ["characters"]="abcdefghijlmnoprstuyz", ["data"]="as-so-ciate as-so-ciates dec-li-na-tion oblig-a-tory phil-an-thropic present presents project projects reci-procity re-cog-ni-zance ref-or-ma-tion ret-ri-bu-tion ta-ble", ["length"]=168, ["n"]=14, },
Would it be possible to add more exceptions to that list as they come up? Or is that inappropriate? you can add your own runtime in a style:
\hyphenation {fo-ob-ar} \hsize 1mm foobar
Sure. I use \startexceptions[en] for that. I just thought everyone might benefit... Denis
Am 09.10.2020 um 08:52 schrieb Denis Maier
: Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm:
\starttext
{EN: \en\hyphenatedcoloredword{applicable}}
{DE: \de\hyphenatedcoloredword{applicable}}
\stoptext
Wow, that's super helpful.
BTW \hyphenatedword works the same. I didn’t see anything colored. There are some more commands like this, even \hyphenatedfile, see https://source.contextgarden.net/tex/context/base/mkiv/supp-box.mkiv?search=... Usually Arthur’s (hail the emperor of hyphenation and protector of the patterns) patterns are flawless, so I guess it’s not a bug but an exception of the rules. Hraban
On 10/9/2020 10:15 AM, Henning Hraban Ramm wrote:
Am 09.10.2020 um 08:52 schrieb Denis Maier
: Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm:
\starttext
{EN: \en\hyphenatedcoloredword{applicable}}
{DE: \de\hyphenatedcoloredword{applicable}}
\stoptext
Wow, that's super helpful.
BTW \hyphenatedword works the same. I didn’t see anything colored. There are some more commands like this, even \hyphenatedfile, see https://source.contextgarden.net/tex/context/base/mkiv/supp-box.mkiv?search=...
Usually Arthur’s (hail the emperor of hyphenation and protector of the patterns) patterns are flawless, so I guess it’s not a bug but an exception of the rules.
ancient secret features:
mtxrun --script patterns --hyphenate applicable --language=gb hyphenator | hyphenator | . a p p l i c a b l e . . a p p l i c a b l e . hyphenator | 2a0p0 2 0 0 0 0 0 0 0 0 0 0 hyphenator | 4p1p2 2 4 1 2 0 0 0 0 0 0 0 hyphenator | 0p2l2 2 4 1 2 2 0 0 0 0 0 0 hyphenator | 1a0b0 2 4 1 2 2 0 1 0 0 0 0 hyphenator | 2b0l2 2 4 1 2 2 0 1 2 0 2 0 hyphenator | 4l0e0.0 2 4 1 2 2 0 1 2 4 2 0 hyphenator | .2a4p1p2l2i0c1a2b4l2e0. . a p-p l i c-a b l e . hyphenator | mtx-patterns | gb 3 3 : applicable : applic-able
mtxrun --script patterns --hyphenate applicable --language=us hyphenator | hyphenator | . a p p l i c a b l e . . a p p l i c a b l e . hyphenator | 4p1p0 0 4 1 0 0 0 0 0 0 0 0 hyphenator | 1p2l2 0 4 1 2 2 0 0 0 0 0 0 hyphenator | 0p0l0i2c1a0b0 0 4 1 2 2 2 1 0 0 0 0 hyphenator | 1c0a0 0 4 1 2 2 2 1 0 0 0 0 hyphenator | 0c0a1b0l0 0 4 1 2 2 2 1 1 0 0 0 hyphenator | 0b2l2 0 4 1 2 2 2 1 1 2 2 0 hyphenator | 0b4l0e0.0 0 4 1 2 2 2 1 1 4 2 0 hyphenator | .0a4p1p2l2i2c1a1b4l2e0. . a p-p l i c-a-b l e . hyphenator | mtx-patterns | us 3 3 : applicable : applic-a-ble
not the kind of stuff one wants to expose a new user to Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
Denis’ latest question reminded me of an earlier query he had about hyphenation, asking why “applicable” and “obligated” were hyphenated by ConTeXt as ap-plic-a-ble and ob-lig-at-ed, and not ap-pli-ca-ble and ob-li-ga-te(d) like in Merriam-Webster (the discussion started at https://mailman.ntg.nl/pipermail/ntg-context/2020/099695.html). First of all, I note that while Webster’s dictionary is a useful guide, and indeed a major reference for any American typographer, there’s no absolute rule that we have to follow it either. The break applic-able, for example, does look acceptable to me; oblig-ated, less so. Taco reminded that when producing a set of hyphenation patterns from a list of hyphenated words, we’re essentially compressing information, and that some minor deviations are to be expected. However, in my experience, unexpected breakpoints are almost never due to chance, but to a deliberate decision. Then Hraban said that: On Fri, Oct 09, 2020 at 10:15:17AM +0200, Henning Hraban Ramm wrote:
Usually Arthur’s (hail the emperor of hyphenation and protector of the patterns) patterns are flawless, so I guess it’s not a bug but an exception of the rules.
I see that my self-appointed title is catching on, nice :-) Unfortunately the patterns are just as likely to contain errors as anything else, and in this particular case we’ll probably never know for sure, because the original hyphenated word list was never published (all the word lists from which patterns were produced in the 80s and 90s have been lost, for all languages). We’re thus reduced to guessing the intent of those who compiled the lists. We can get hints from looking at the patterns involved in the debatable breaks. Hans has a useful script: $ mtxrun --script patterns --language=us --left=2 --right=2 --hyphenate applicable hyphenator | hyphenator | . a p p l i c a b l e . . a p p l i c a b l e . hyphenator | 4p1p0 0 4 1 0 0 0 0 0 0 0 0 hyphenator | 1p2l2 0 4 1 2 2 0 0 0 0 0 0 hyphenator | 0p0l0i2c1a0b0 0 4 1 2 2 2 1 0 0 0 0 hyphenator | 1c0a0 0 4 1 2 2 2 1 0 0 0 0 hyphenator | 0c0a1b0l0 0 4 1 2 2 2 1 1 0 0 0 hyphenator | 0b2l2 0 4 1 2 2 2 1 1 2 2 0 hyphenator | 0b4l0e0.0 0 4 1 2 2 2 1 1 4 2 0 hyphenator | .0a4p1p2l2i2c1a1b4l2e0. . a p-p l i c-a-b l e . hyphenator | mtx-patterns | us 2 2 : applicable : ap-plic-a-ble That tells us that there are seven patterns involved in hyphenating the word applicable: 4p1, 1p2l2, pli2c1ab, 1ca, ca1bl, b2l2, and b4le. (the final dot is part of that last pattern). The pattern responsible for the break applic-able is pli2c1ab. If we now refer to the source repository for hyphenation patterns (since comments are stripped in the ConTeXt sources): https://github.com/hyphenation/tex-hyphen/blob/master/hyph-utf8/tex/generic/... -- we can see line 4508 hyphen.tex patterns end here, and additional patterns begin: which means that the pattern pli2c1ab, line 4817, is an “additional pattern”. The background story is that hyphen.tex, the original hyphenation pattern file for American English, produced in 1982-1983 from a list of hyphenated words (following mostly Webster’s), was later augmented with more patterns that were supposed to improve hyphenation for many words. The person who added these new patterns apparently had a list of words hyphenated incorrectly (according to him) by hyphen.tex, but both that list and the one used to produce hyphen.tex are as mentioned above now lost, probably forever. In any case, the pattern that causes the break applic-able was clearly added intentionally; and as I said that break seems quite reasonable to me. Not so for the one in oblig-ated, so let’s have a look at that: $ mtxrun --script patterns --language=us --left=2 --right=2 --hyphenate obligated hyphenator | hyphenator | . o b l i g a t e d . . o b l i g a t e d . hyphenator | 0o0b0l0i2g1 0 0 0 0 2 1 0 0 0 0 hyphenator | 0b2l2 0 0 2 2 2 1 0 0 0 0 hyphenator | 5l0i0g0a0t0e0 0 0 5 2 2 1 0 0 0 0 hyphenator | 2i0g0 0 0 5 2 2 1 0 0 0 0 hyphenator | 1g0a0 0 0 5 2 2 1 0 0 0 0 hyphenator | 2t1e0d0 0 0 5 2 2 1 2 1 0 0 hyphenator | .0o0b5l2i2g1a2t1e0d0. . o b-l i g-a t-e d . hyphenator | mtx-patterns | us 2 2 : obligated : ob-lig-at-ed Here we see that the dubious break is caused by the pattern obli2g1, also an “additional pattern” (line 4783), and here it’s not hard to guess where it comes from: it has to be for the word obligatory, hyphenated regularly as o-blig-a-to-ry according to M-W -- and myself ;-) The incorrect breakpoint in obli-gated is an undesired side effect of that. Best, ArthuR
On 10/8/2020 7:05 PM, Henning Hraban Ramm wrote:
Am 08.10.2020 um 17:41 schrieb Denis Maier
: where can I find the hyphenation patterns used by ConTeXt? I have two wrongly hyphenated words, and I want to check whether this is due to incorrect patterns. (I tried the source browser... not much luck so far.) The words are: 1. applicable => hyphenated as applic-able 2. obligated => hyphenated as oblig-ated
I know I can use \hyphenation to correct that, but I wanted to check the patterns nevertheless.
I guess it’s just a valid option. You can check possible hyphenations like this:
\starttext
{EN: \en\hyphenatedcoloredword{applicable}}
{DE: \de\hyphenatedcoloredword{applicable}}
\stoptext
americans and brits hyphnetate differently \starttext {\language[usenglish] {\tt US \number\normallanguage}: \hyphenatedcoloredword{applicable}}\par {\language[ukenglish] {\tt UK \number\normallanguage}: \hyphenatedcoloredword{applicable}}\par \stoptext syllable vs stem (but I bet Arthur can explain better) hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
participants (6)
-
Arthur Rosendahl
-
Denis Maier
-
Hans Hagen
-
Henning Hraban Ramm
-
Taco Hoekwater
-
Tomas Hala