------------------------------------------------------------------------ r3854 | taco | 2010-09-04 11:23:41 +0200 (Sat, 04 Sep 2010) | 8 lines Changed paths: M /trunk/manual/luatexref-t.pdf M /trunk/manual/luatexref-t.tex M /trunk/source/texk/web2c/luatexdir/lang/texlang.w M /trunk/source/texk/web2c/luatexdir/tex/maincontrol.w * insertion of discretionaries following explicit hyphen characters now happens in hnj_hyphenation(), not earlier * the main control loop has been simplified accordingly * hyphenation exceptions are now case sensitive * it is now possible to add extra hyphenation points to a compound word containing explict hyphens via a hyphenation exception ------------------------------------------------------------------------
Patrick Gundlach wrote:
[changes from svn/cron]
* hyphenation exceptions are now case sensitive
Does that mean that if I write
\hyphenation{man-u-script}
"Manuscript" will not be covered?
Yes, exactly. It seemed a reasonable thing to do, considering that it is considerably easier to add another exception than it is to prevent an unwanted hyphenation if it is defined by an exception. Best wishes, Taco
Patrick Gundlach wrote:
[changes from svn/cron]
* hyphenation exceptions are now case sensitive Does that mean that if I write \hyphenation{man-u-script} "Manuscript" will not be covered?
Yes, exactly. It seemed a reasonable thing to do, considering that it is considerably easier to add another exception than it is to prevent an unwanted hyphenation if it is defined by an exception.
While I don't have an opinion at all on whether this is good or not, won't this a) break TeX compatibility and b) will make existing documents (hyphenation exceptions) "fail" ? I am _not_ asking to change anything, I am just wondering and trying to understand TeX. Patrick
On 09/05/2010 02:55 PM, Patrick Gundlach wrote:
Patrick Gundlach wrote:
[changes from svn/cron]
* hyphenation exceptions are now case sensitive Does that mean that if I write \hyphenation{man-u-script} "Manuscript" will not be covered?
Yes, exactly. It seemed a reasonable thing to do, considering that it is considerably easier to add another exception than it is to prevent an unwanted hyphenation if it is defined by an exception.
While I don't have an opinion at all on whether this is good or not, won't this
a) break TeX compatibility and
Yeah, sure it does. But LuaTeX's language handling is already incompatible in other ways, so that is not a big deal.
b) will make existing documents (hyphenation exceptions) "fail" ?
Yes, for example: bla bla bla. Declination is .... will not use the standard hyphen.tex exception for "Declination". Personally, I think this is acceptable. But if there are objections, I can go back to being case-incensitive, it is not that important. Best wishes, Taco
On Sun, Sep 05, 2010 at 04:35:16PM +0200, Taco Hoekwater wrote:
On 09/05/2010 02:55 PM, Patrick Gundlach wrote:
Patrick Gundlach wrote:
[changes from svn/cron]
* hyphenation exceptions are now case sensitive Does that mean that if I write \hyphenation{man-u-script} "Manuscript" will not be covered?
Yes, exactly. It seemed a reasonable thing to do, considering that it is considerably easier to add another exception than it is to prevent an unwanted hyphenation if it is defined by an exception.
While I don't have an opinion at all on whether this is good or not, won't this
a) break TeX compatibility and
Yeah, sure it does. But LuaTeX's language handling is already incompatible in other ways, so that is not a big deal.
b) will make existing documents (hyphenation exceptions) "fail" ?
Yes, for example:
bla bla bla. Declination is ....
will not use the standard hyphen.tex exception for "Declination".
Personally, I think this is acceptable. But if there are objections, I can go back to being case-incensitive, it is not that important.
Since we are using different hyphenation patterns anyway (may be except for English), then I think it is better to adapt hyphenation patterns, I don't even see a reason for defaulting to Knuth's patterns for English, since the output is likely to be different anyway. Regards, Khaled -- Khaled Hosny Arabic localiser and member of Arabeyes.org team Free font developer
Since we are using different hyphenation patterns anyway
Not really, apart for German, for which patterns are being actively updated and tested. For most of the other languages we still use patterns that are sometimes several decades old, and not really maintained by anyone. The only difference between UTF-8 TeX engines and 8-bit engines is that for the latter ones, the patterns are converted to an appropriate 8-bit encoding on the fly, with the additional difference that LuaTeX can load patterns on demand during each run, but it still uses the same patterns. Arthur
Am 05.09.2010 16:35, schrieb Taco Hoekwater:
On 09/05/2010 02:55 PM, Patrick Gundlach wrote:
b) will make existing documents (hyphenation exceptions) "fail" ?
Yes, for example:
bla bla bla. Declination is ....
will not use the standard hyphen.tex exception for "Declination".
What about patterns, are they going to be case-sensitive, too? That could resolve one class of homonyms in German language, e.g., spie-len-de vs. Spiel-en-de At the same time, I think the number of used levels (we are at level 8 already), average pattern length, and number of matching patterns per word could decrease, because case-sensitive patterns are should be more specific then case-insensitive ones. All at the cost of more patterns of course, but I haven't done any tests so far. I'll report back, if I have some numbers.
Personally, I think this is acceptable. But if there are objections, I can go back to being case-incensitive, it is not that important.
A configuration option seems sensible that, if set to case-insensitive, lower-cases words before pattern matching, effectively ignoring patterns containing upper-case letters (lower-casing patterns, too, could generate conflicting patterns). I have no preference for the default behaviour, though. Hyphenation quality would go down in both cases, case-insensitive patterns are used with a case-sensitive pattern matching strategy and vice versa. But it would preserve (a way for) backwards-compatibility, i.e., maintaining the current level of hyphenation quality, when existing patterns are used. Best regards, Stephan Hennig
What about patterns, are they going to be case-sensitive, too? That could resolve one class of homonyms in German language, e.g.,
spie-len-de vs. Spiel-en-de
"Spielende" can be hyphenated the first way if it starts at the beginning of the sentence: "Spie-len-de Kinder sehe ich jeden Tag." Patrick
Am 06.09.2010 10:54, schrieb Patrick Gundlach:
What about patterns, are they going to be case-sensitive, too? That could resolve one class of homonyms in German language, e.g.,
spie-len-de vs. Spiel-en-de
"Spielende" can be hyphenated the first way if it starts at the beginning of the sentence:
"Spie-len-de Kinder sehe ich jeden Tag."
Ouch, didn't think about capitalisation at sentence beginning! As in German language every word can be at the beginning of a sentence, case-sensitive hyphenation would indeed introduce new errors (where there are clear rules what hyphenation to prefer for homonyms in our patterns currently). Case-sensitive hyphenation seems to need grammar analysis. Best regards, Stephan Hennig
Hi, On 09/06/10 10:47, Stephan Hennig wrote:
A configuration option seems sensible that, if set to case-insensitive, lower-cases words before pattern matching, effectively ignoring patterns
That would be getting silly. I will revert to case-insensitive exceptions. Best wishes, Taco
Cron Daemon wrote:
* it is now possible to add extra hyphenation points to a compound word containing explict hyphens via a hyphenation exception
I do not quite understand this comment: what syntax is used to specify hyphenation for the word "pseudo-compatible" for example ? -- Rémy.
On Sun, Sep 05, 2010 at 01:02:44AM +0200, Oudompheng Rémy wrote:
Cron Daemon wrote:
* it is now possible to add extra hyphenation points to a compound word containing explict hyphens via a hyphenation exception
I do not quite understand this comment: what syntax is used to specify hyphenation for the word "pseudo-compatible" for example ?
I think it is explained in the manual (that was updated in that commit). (I really appreciate that LuaTeX has a reference manual right from the start, that is actively maintained and not an afterthought.) Regards, Khaled -- Khaled Hosny Arabic localiser and member of Arabeyes.org team Free font developer
On 09/05/2010 01:02 AM, Oudompheng Rémy wrote:
Cron Daemon wrote:
* it is now possible to add extra hyphenation points to a compound word containing explict hyphens via a hyphenation exception
I do not quite understand this comment: what syntax is used to specify hyphenation for the word "pseudo-compatible" for example ?
Please rtfm, it is not just my diary. Anyway: \hyphenation{pseudo{-}{}{-}compa-tible} Best wishes, Taco
participants (9)
-
Arthur Reutenauer
-
Khaled Hosny
-
Martin Schröder
-
Oudompheng Rémy
-
Patrick Gundlach
-
root@mail.boekplan.nl
-
Stephan Hennig
-
taco
-
Taco Hoekwater