Hello, the following PlainTeX document produces unexpected results while hyphenating the word "Streifzüge": -------------------------------------------------------------------- \hyphenation{Streif-zü-ge} Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge \showhyphens{Streifzüge} \showhyphens{Streif\-zü\-ge} \bye -------------------------------------------------------------------- This is LuaTeX, Version snapshot-0.25.0-2008031419 (Web2C 7.5.6) (format=luatex 2008.3.14) 30 MAR 2008 22:29 **HyphenationProblem.tex (HyphenationProblem.tex Overfull \hbox (5.02353pt too wide) in paragraph at lines 5--9 \tenrm Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge| \hbox(6.94444+1.94444)x469.75499, glue set - 1.0, direction TLT .\tenrm S .\tenrm t .\tenrm r .\tenrm e .\tenrm i .etc. Underfull \hbox (badness 10000) in paragraph at lines 15--15 [][] \tenrm Streifzüge \hbox(6.94444+1.94444)x16383.99998, glue set 9793.94307, direction TLT [] Underfull \hbox (badness 10000) in paragraph at lines 17--17 [][] \tenrm Streif-zü-ge \hbox(6.94444+1.94444)x16383.99998, glue set 9793.94307, direction TLT [] [1] ) Output written on HyphenationProblem.dvi (1 page, 1052 bytes). -------------------------------------------------------------------- As you can see, I explicitly specify the hyphenation points for "Streifzüge", but for some reason LuaTeX does not hyphenate it automatically. Only when I specify the hyphenation points in the text, does the word get hyphenated. Am I doing something wrong? Jonathan
Jonathan Sauer wrote:
Hello, --------------------------------------------------------------------
As you can see, I explicitly specify the hyphenation points for "Streifzüge", but for some reason LuaTeX does not hyphenate it automatically. Only when I specify the hyphenation points in the text, does the word get hyphenated.
Am I doing something wrong?
You are missing two things: * the \lccode of ü has to be nonzero for it to be a valid word constituent * The font cmr10 doesn't have an ü glyph With the addition of \lccode`ü=`ü \font\tenrm=ec-lmr10 \tenrm I get proper typesetting (with hyphenation) of your test file. Best wishes, Taco
Hello,
Am I doing something wrong?
You are missing two things:
* the \lccode of ü has to be nonzero for it to be a valid word constituent
Oh. I thought since hyphenation has been completely been revamped, the requirement of a non-zero \lccode has been removed. Especially since I did not get an error message (IIRC, the original TeX complained about this, at least in \pattern).
* The font cmr10 doesn't have an ü glyph
Of course. But this should not affect the possible hyphenation points, or should it?
Best wishes, Taco
Jonathan
"Jonathan Sauer"
Hello,
Am I doing something wrong?
You are missing two things:
* the \lccode of ü has to be nonzero for it to be a valid word constituent
Oh. I thought since hyphenation has been completely been revamped, the requirement of a non-zero \lccode has been removed. Especially since I did not get an error message (IIRC, the original TeX complained about this, at least in \pattern).
That's what I thought as well. Clarification would be welcome. -- David Kastrup
Hi, Jonathan Sauer wrote:
Hello,
Am I doing something wrong? You are missing two things:
* the \lccode of ü has to be nonzero for it to be a valid word constituent
Oh. I thought since hyphenation has been completely been revamped, the requirement of a non-zero \lccode has been removed. Especially
It actually was gone for a while (\lccodes were not in the new hyphenation codebase in the initial rewrite), but then I realised that that gives big problems with words followed by punctuation. So I had to reinsert the code in the hyphenation algorithm.
since I did not get an error message (IIRC, the original TeX complained about this, at least in \pattern).
I never bothered to restore the error (or at least not for \hyphenation), because it was not all that useful an error to begin with: it is only a potential error at this point, a following bit of code can still change the \lccodes if it wants to (even revert to zero). A better solution for this 'what is a word' problem would be nice. Perhaps one will be offered by the Google Summer of Code; one of the proposed projects is "Better unicode support".
* The font cmr10 doesn't have an ü glyph
Of course. But this should not affect the possible hyphenation points, or should it?
No, it shouldn't. Best wishes, Taco
Hello,
Am I doing something wrong? You are missing two things:
* the \lccode of ü has to be nonzero for it to be a valid word constituent
Now it works perfectly. Thanks!
A better solution for this 'what is a word' problem would be nice. Perhaps one will be offered by the Google Summer of Code; one of the proposed projects is "Better unicode support".
Why not check for catcode 11? SCNR, Jonathan
Jonathan Sauer wrote:
Hello,
Am I doing something wrong? You are missing two things:
* the \lccode of ü has to be nonzero for it to be a valid word constituent
Now it works perfectly. Thanks!
A better solution for this 'what is a word' problem would be nice. Perhaps one will be offered by the Google Summer of Code; one of the proposed projects is "Better unicode support".
Why not check for catcode 11?
It would still be the same problem, only with a different variable. And \lccode could not be removed even then, it is needed for downcasing words. Otherwise, all pattern files would have to contain all patterns for uppercase letters as well. Best wishes, Taco
"Jonathan Sauer"
Hello,
the following PlainTeX document produces unexpected results while hyphenating the word "Streifzüge":
--------------------------------------------------------------------
\hyphenation{Streif-zü-ge}
Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge Streifzüge
Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge Streif\-zü\-ge
\showhyphens{Streifzüge}
\showhyphens{Streif\-zü\-ge}
\bye
Actually, there is not much unexpected to me except that the \hyphenation command gets accepted in spite of ü being a non-letter. Compare with the results when using standard TeX. -- David Kastrup
participants (3)
-
David Kastrup
-
Jonathan Sauer
-
Taco Hoekwater