Hi all, something fishy is going on with hyphenation patterns for German in mkiv. Here's a minimal test file: \starttext {\de \hyphenatedword{sich}} \stoptext please compile with mkii and mkiv and see the difference. The word should of course not be hyphenated. All best Thomas
Thomas A. Schmitz wrote:
Hi all,
something fishy is going on with hyphenation patterns for German in mkiv. Here's a minimal test file:
\starttext
{\de \hyphenatedword{sich}}
\stoptext
please compile with mkii and mkiv and see the difference. The word should of course not be hyphenated.
so th epatterns are not good enough for lefthyphenmin=2 we can set (for de) ... \c!lefthyphenmin=3, \c!righthyphenmin=3, is that ok then? ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Nov 22, 2008, at 12:29 PM, Hans Hagen wrote:
Thomas A. Schmitz wrote:
Hi all,
something fishy is going on with hyphenation patterns for German in mkiv. Here's a minimal test file:
\starttext
{\de \hyphenatedword{sich}}
\stoptext
please compile with mkii and mkiv and see the difference. The word should of course not be hyphenated.
so th epatterns are not good enough for lefthyphenmin=2
we can set (for de) ...
\c!lefthyphenmin=3, \c!righthyphenmin=3,
is that ok then?
It would help in this particular case, but in general, a hyphenation such as "al-le" ist correct, so left|righthyphenmin=2 is OK. I suspect the error is not in lefthyphenmin and righthyphenmin, but in the patterns themselves. Which patterns does mkiv actually use, and how have they been produced? Thomas
Thomas A. Schmitz wrote:
On Nov 22, 2008, at 12:29 PM, Hans Hagen wrote:
Thomas A. Schmitz wrote:
Hi all,
something fishy is going on with hyphenation patterns for German in mkiv. Here's a minimal test file:
\starttext
{\de \hyphenatedword{sich}}
\stoptext
please compile with mkii and mkiv and see the difference. The word should of course not be hyphenated. so th epatterns are not good enough for lefthyphenmin=2
we can set (for de) ...
\c!lefthyphenmin=3, \c!righthyphenmin=3,
is that ok then?
It would help in this particular case, but in general, a hyphenation such as "al-le" ist correct, so left|righthyphenmin=2 is OK. I suspect the error is not in lefthyphenmin and righthyphenmin, but in the patterns themselves. Which patterns does mkiv actually use, and how have they been produced?
i don't know; maybe do some experiments with mkii versus mkiv and different hyphenmin settings to see what happens Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Sat, Nov 22, 2008 at 12:35 PM, Thomas A. Schmitz wrote:
I suspect the error is not in lefthyphenmin and righthyphenmin, but in the patterns themselves. Which patterns does mkiv actually use, and how have they been produced?
It uses the patterns `dehypht-x' 2008-06-18 (WL) Wait!!! Have de and deo been switched???? { "de", "hyph-de-1901.tex", "german, old spelling" }, { "deo", "hyph-de-1996.tex", "german, new spelling" }, Even though this is probably not the reason for your problem since mkii and mkiv should use the same patterns(?), the two lines in mtxrun.lua need to be switched. Mojca
On Sat, Nov 22, 2008 at 1:13 PM, Mojca Miklavec wrote:
On Sat, Nov 22, 2008 at 12:35 PM, Thomas A. Schmitz wrote:
I suspect the error is not in lefthyphenmin and righthyphenmin, but in the patterns themselves. Which patterns does mkiv actually use, and how have they been produced?
It uses the patterns `dehypht-x' 2008-06-18 (WL)
Sorry, I was a bit inaccurate. It uses patterns from tex/context/patterns/lang-de.pat which have been generated with mtxrun --script pattern --convert that more or less copies contents of hyph-de-1901.tex (should be 1996). And those patterns are more or less a literal copy of http://repo.or.cz/w/wortliste.git plus maybe some time delay. Mojca PS: in LaTeX there is indeed a difference whether one uses pdfTeX or XeTeX/LuaTeX since the two engines load differente patterns, but in ConTeXt I see no reason for a different behaviour.
On Nov 22, 2008, at 1:22 PM, Mojca Miklavec wrote:
PS: in LaTeX there is indeed a difference whether one uses pdfTeX or XeTeX/LuaTeX since the two engines load differente patterns, but in ConTeXt I see no reason for a different behaviour.
Hmm, that's a nice understatement :-) Fact is that the wrong hyphenation only occurs when I compile my little test document with luatex; both xetex and pdftex give the expected result. Changing left| righthyphenmin is definitely a hack and not the correct way to go. Btw, the result stays the same when I set the language to deo instead of de. Thomas
Thomas A. Schmitz wrote:
On Nov 22, 2008, at 1:22 PM, Mojca Miklavec wrote:
PS: in LaTeX there is indeed a difference whether one uses pdfTeX or XeTeX/LuaTeX since the two engines load differente patterns, but in ConTeXt I see no reason for a different behaviour.
Hmm, that's a nice understatement :-) Fact is that the wrong hyphenation only occurs when I compile my little test document with luatex; both xetex and pdftex give the expected result. Changing left| righthyphenmin is definitely a hack and not the correct way to go. Btw, the result stays the same when I set the language to deo instead of de.
what does xetex, pdftex, luatex report for: {\de \thelefthyphenmin blabla} ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Mojca Miklavec wrote:
On Sat, Nov 22, 2008 at 12:35 PM, Thomas A. Schmitz wrote:
I suspect the error is not in lefthyphenmin and righthyphenmin, but in the patterns themselves. Which patterns does mkiv actually use, and how have they been produced?
It uses the patterns `dehypht-x' 2008-06-18 (WL)
Wait!!! Have de and deo been switched???? { "de", "hyph-de-1901.tex", "german, old spelling" }, { "deo", "hyph-de-1996.tex", "german, new spelling" },
hey, didn't *you* check that? lucky us that no german user noticed the difference
Even though this is probably not the reason for your problem since mkii and mkiv should use the same patterns(?), the two lines in mtxrun.lua need to be switched.
ok, i swapped them in mtx-patterns and swapped the de and deo files in my tree ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Thomas A. Schmitz wrote:
On Nov 22, 2008, at 2:32 PM, Hans Hagen wrote:
what does xetex, pdftex, luatex report for:
{\de \thelefthyphenmin blabla}
\the\lefthyphenmin is 2 in all engines
ok, luatex has a reimplemented hyphenation machinery so that may be a reason; another can be that we don't use the german patterns at all although i do see a difference .. \setupcolors[state=start] \starttext \en \hyphenatedword{blabla hello smithonian bugs schmitzonian bugs schmitzlich} \de \hyphenatedword{blabla hello smithonian bugs schmitzonian bugs schmitzlich} \nl \hyphenatedword{blabla hello smithonian bugs schmitzonian bugs schmitzlich} \sv \hyphenatedword{blabla hello smithonian bugs schmitzonian bugs schmitzlich} \stoptext Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Sat, Nov 22, 2008 at 4:15 PM, Hans Hagen wrote:
Mojca Miklavec wrote:
On Sat, Nov 22, 2008 at 12:35 PM, Thomas A. Schmitz wrote:
I suspect the error is not in lefthyphenmin and righthyphenmin, but in the patterns themselves. Which patterns does mkiv actually use, and how have they been produced?
It uses the patterns `dehypht-x' 2008-06-18 (WL)
Wait!!! Have de and deo been switched???? { "de", "hyph-de-1901.tex", "german, old spelling" }, { "deo", "hyph-de-1996.tex", "german, new spelling" },
hey, didn't *you* check that?
I should have checked, yes. I did, but I oversaw this. I admit, it's all my fault (lucky me that I don't know the differences between the two orthographies well enough).
ok, i swapped them in mtx-patterns and swapped the de and deo files in my tree
There have recently been some more changes (for example Hungarian now includes way more extensive patterns, and Lithuanian and Latvian have been added, but there's no support for the two languages in ConTeXt anyway). Just FYI. (Just out of curiosity - why has { "agr", "hyph-grc", "ancient greek" } been commented out? My next question to Thomas would be how you handle those patterns in pdftex, but I won't ask that.)
ok, luatex has a reimplemented hyphenation machinery so that may be a reason; another can be that we don't use the german patterns at all although i do see a difference ..
My fear is that there could be some tiny difference in that reimplementation of hyphenation algorithm. Most words hyphenate properly and equally in both engines. This is the first counter-example that I have seen. But that might be something for Taco to check. Mojca
On Nov 22, 2008, at 4:45 PM, Mojca Miklavec wrote:
My fear is that there could be some tiny difference in that reimplementation of hyphenation algorithm. Most words hyphenate properly and equally in both engines. This is the first counter-example that I have seen. But that might be something for Taco to check.
Mojca
To be honest: this is something that has been bugging me for quite a while, I just took some time to realize what the problem was. In all my texts, the letters "ch" get hyphenated at the end, and many German adjectives end in "ch" (see "lich" in Hans's example). What I don't understand: doesn't mkiv/luatex use the same patterns that XeTeX uses? Then why the deuce does it show different hyphenation? For German users, this is pretty serious, a hyphenation like "niedli-ch" is really bad and neither in traditional nor modernized German spelling. Thomas
Thomas A. Schmitz wrote:
What I don't understand: doesn't mkiv/luatex use the same patterns that XeTeX uses? Then why the deuce does it show different hyphenation? For German users, this is pretty serious, a hyphenation like "niedli-ch" is really bad and neither in traditional nor modernized German spelling.
This must be a bug in luatex, hyphenation is supposed to be identical but the whole algorithm is redone, and obviously not flawlessly. It seems there is (at least) a problem with all patterns that are supposed to end a word. For example, the "4ch." that is supposed to prevent i-ch appears to be ignored (at first glance, it looks like it is interpreted as "c4h." and etc. for all others with a trailing ".") I will investigate further next week, at the office. Best wishes, Taco
On Nov 22, 2008, at 6:56 PM, Taco Hoekwater wrote:
This must be a bug in luatex, hyphenation is supposed to be identical but the whole algorithm is redone, and obviously not flawlessly.
It seems there is (at least) a problem with all patterns that are supposed to end a word. For example, the "4ch." that is supposed to prevent i-ch appears to be ignored (at first glance, it looks like it is interpreted as "c4h." and etc. for all others with a trailing ".")
I will investigate further next week, at the office.
Best wishes, Taco
Excellent, Taco, you're on the case! Looking forward to hearing about your little investigation - "elementary, my dear Watson!" :) All best Thomas
Thomas A. Schmitz wrote:
On Nov 22, 2008, at 6:56 PM, Taco Hoekwater wrote:
This must be a bug in luatex, hyphenation is supposed to be identical but the whole algorithm is redone, and obviously not flawlessly.
It seems there is (at least) a problem with all patterns that are supposed to end a word. For example, the "4ch." that is supposed to prevent i-ch appears to be ignored (at first glance, it looks like it is interpreted as "c4h." and etc. for all others with a trailing ".")
I will investigate further next week, at the office.
Best wishes, Taco
Excellent, Taco, you're on the case! Looking forward to hearing about your little investigation - "elementary, my dear Watson!" :)
I wouldn't say it was elementary, but it is fixed now. Sometime later this week I will create a 0.30.3 (as this is a grave bug), but if you want to verify: the fix is in the source repository (#1576-1578). Best wishes, Taco
On Nov 24, 2008, at 2:39 PM, Taco Hoekwater wrote:
I wouldn't say it was elementary, but it is fixed now. Sometime later this week I will create a 0.30.3 (as this is a grave bug), but if you want to verify: the fix is in the source repository (#1576-1578).
Best wishes, Taco
Hi Taco, of course I was curious and tried to compile the trunk, but I get an error: /usr/bin/ar rv libopenbsd-compat.a bsd-asprintf.o bsd-snprintf.o strlcat.o strlcpy.o strsep.o strtonum.o strtoll.o strtoul.o ar: creating archive libopenbsd-compat.a a - bsd-asprintf.o a - bsd-snprintf.o a - strlcat.o a - strlcpy.o a - strsep.o a - strtonum.o a - strtoll.o a - strtoul.o ranlib: file: libopenbsd-compat.a(bsd-asprintf.o) has no symbols ranlib: file: libopenbsd-compat.a(bsd-snprintf.o) has no symbols ranlib: file: libopenbsd-compat.a(strlcat.o) has no symbols ranlib: file: libopenbsd-compat.a(strlcpy.o) has no symbols ranlib: file: libopenbsd-compat.a(strsep.o) has no symbols ranlib: file: libopenbsd-compat.a(strtoll.o) has no symbols ranlib: file: libopenbsd-compat.a(strtoul.o) has no symbols ranlib libopenbsd-compat.a ranlib: file: libopenbsd-compat.a(bsd-asprintf.o) has no symbols ranlib: file: libopenbsd-compat.a(bsd-snprintf.o) has no symbols ranlib: file: libopenbsd-compat.a(strlcat.o) has no symbols ranlib: file: libopenbsd-compat.a(strlcpy.o) has no symbols ranlib: file: libopenbsd-compat.a(strsep.o) has no symbols ranlib: file: libopenbsd-compat.a(strtoll.o) has no symbols ranlib: file: libopenbsd-compat.a(strtoul.o) has no symbols mkdir -p ../../libs/lua51 && cd ../../libs/lua51 && cp -f ../../../src/ texk/web2c/../../libs/lua51/* . && make posix Makefile:25: *** missing separator. Stop. make: *** [../../libs/lua51/liblua.a] Error 2 Hope this is nothing too serious... All best, and thanks Thomas
Thomas A. Schmitz wrote:
mkdir -p ../../libs/lua51 && cd ../../libs/lua51 && cp -f ../../../src/ texk/web2c/../../libs/lua51/* . && make posix Makefile:25: *** missing separator. Stop. make: *** [../../libs/lua51/liblua.a] Error 2
Hope this is nothing too serious...
It looks like your checkout is not complete (locally editted files?), line 25 of libs/lua51/Makefile should now be a commented-out line: #COCOCFLAGS= -DCOCO_USE_SETJMP Best wishes, Taco
On Nov 24, 2008, at 3:11 PM, Taco Hoekwater wrote:
It looks like your checkout is not complete (locally editted files?), line 25 of libs/lua51/Makefile should now be a commented-out line:
#COCOCFLAGS= -DCOCO_USE_SETJMP
Best wishes, Taco
Strange. I deleted the Makefile and let svn regenerate it, and now compilation works. And the wrong hyphenations are gone! Hooray and three cheers, thanks Taco! All best Thomas
participants (4)
-
Hans Hagen
-
Mojca Miklavec
-
Taco Hoekwater
-
Thomas A. Schmitz