lualatex (incorrectly) replaces em dashes with hyphens
Hi folks, The current version of lualatex distributed with TeX Live (as installed today via TeX Live's install-tl, or via Arch Linux's packages) replaces em dashes with hyphens if they're not separated by any space. For example, if given \documentclass{article} % This makes no difference: %\usepackage{fontspec} %\setmainfont[Ligatures=TeX]{Latin Modern Roman} % \begin{document} \noindent% I have \emph{no} idea why em dashes---these things---are being replaced with hyphens in Lua\LaTeX. But --- this only seems to happen if there is no space around them. \end{document} lualatex replaces the first two dashes with hyphens, but not the third. The problem goes away when I downgrade to a version from December (Arch's previous release of their texlive-core package was built December 1), and neither pdflatex or xelatex exhibit this behavior.
lualatex replaces the first two dashes with hyphens, but not the third.
Imho that's a luaotfload problem (in context and the plain fontloader it seems to work fine), so I added an issue there: https://github.com/u-fischer/luaotfload/issues/44 -- Mit freundlichen Grüßen Ulrike Fischer
The current version of lualatex distributed with TeX Live (as installed today via TeX Live's install-tl, or via Arch Linux's packages) replaces em dashes with hyphens if they're not separated by any space.
It looks as if I was wrong. It is not a luaotfload problem, but of the generic fontloader. It doesn't look as if this will change, so you have the options 1. Add \automatichyphenmode=1 This will avoid the error at the cost that there will be no longer a possible line break point after the em dash (perhaps we will make this the default. It will also prevent line breaks after the hyphen e.g. for $p$-adic and if the hyphen is at the begin of the word, something that is good for some languages.) 2. Input the emdash directly: —. I wouldn't like this much, as I can type --- much faster than —, also --- is more visible, but it could be that it is your best option. -- Mit freundlichen Grüßen Ulrike Fischer
It looks as if I was wrong. It is not a luaotfload problem, but of the generic fontloader. It doesn't look as if this will change, so you have the options
1. Add \automatichyphenmode=1
This will avoid the error at the cost that there will be no longer a possible line break point after the em dash (perhaps we will make this the default. It will also prevent line breaks after the hyphen e.g. for $p$-adic and if the hyphen is at the begin of the word, something that is good for some languages.)
2. Input the emdash directly: —. I wouldn't like this much, as I can type --- much faster than —, also --- is more visible, but it could be that it is your best option.
I suppose I'll go with option 1 for now, but it seems very unfortunate that luatex no longer behaves like every other TeX engine wrt this simple behavior. This is a large breaking change - it affects every existing document that doesn't explicitly set `\automatichyphenmode` or use Unicode em dashes. May I ask what recent work caused this?
I suppose I'll go with option 1 for now,
I discussed this on the latex list, and quite probably the next latex format will set \automatichyphenmode=1 too.
but it seems very unfortunate that luatex no longer behaves like every other TeX engine wrt this simple behavior.
Well I made a few tests also with pdftex and xetex (which has a similar value \XeTeXdashbreakstate) and there is no "like every other engine" behaviour". They all have slight differences how they handle this. If you want to try check the document below. And this stuff is not simple. --- is a ligature, and ligatures can not be fixed before hyphenation as luatex must be able to split them there. The ligature --- has the additional problem that is consists of hyphens which play a special role in word breaking. It is also not possible to replace them earlier, e.g. at input as you don't want such a replacement normally for tt fonts, so it must be font specific.
This is a large breaking change
While I understand that this is annoying: luatex and the fontloader code is still a moving target, and such changes can not always be avoided. Also while I'm trying to build a large enough test pool, there are still a number of area where we can't catch changes. Just to warn you: the luaotfload version I just uploaded will change letterspacing in an incompatible way, so if you are using it, check the documentation about your options.
- it affects every existing document that doesn't explicitly set `\automatichyphenmode` or use Unicode em dashes. May I ask what recent work caused this?
I didn't try to find out, some change in the generic fontloader we import from context. context has the same problem if \automatichyphenmode is set to zero: \starttext \automatichyphenmode=0 dash---dash %changes to endash dash--- dash dash ---dash %ok \stoptext so you could ask on the context list, if they change it, we will pick it up. % test file for hyphen/en-/em-dash handling: \documentclass{article} \textwidth=2pt \begin{document} % luatex: set \automatichyphenmode= 0/1/2 % xetex: set \XeTeXdashbreakstate=0/1 -hyphen hyph-hyph ndash--ndash ndash–ndash endash--endash endash–endash mdash---mdash mdash—mdash \mbox{mdash---mdash} %not ok in luatex with value 0 \end{document} -- Mit freundlichen Grüßen Ulrike Fischer
Thanks for the in-depth explanation, and for experimenting to see how xetex and friends handle this.
Well I made a few tests also with pdftex and xetex (which has a similar value \XeTeXdashbreakstate) and there is no "like every other engine" behaviour". They all have slight differences how they handle this. If you want to try check the document below.
And this stuff is not simple[...]
While I understand that this is annoying: luatex and the fontloader code is still a moving target, and such changes can not always be avoided[...]
I appreciate that this is a _very_ complicated task, that one shouldn't expect {lua,xe,pdf}tex to work identically, and that these projects are under active development. With that said, I respectfully point out that "--- gives an em dash" is prescribed in almost every TeX and LaTeX reference you can find, including Knuth's TeXbook, Lamport's LaTeX: ADPS, and popular online guides. A change like this is sure to cause a mismatch in expectations for most folks: anybody who doesn't actively follow TeX development on the mailing lists (which I presume is most users) will likely be surprised and frustrated that this decades-old convention suddenly stops working after recent updates.
so you could ask on the context list, if they change it, we will pick it up.
I'll try that, thanks. P.S. I wasn't CC'd on your latest response (and I'm not subscribed to dev-luatex), so I'm replying to the last one I got. Apologies if this mixes up the order in the mailing list a bit.
On Tue, Feb 26, 2019 at 08:56:05PM -0800, Matt Kline wrote:
With that said, I respectfully point out that "--- gives an em dash" is prescribed in almost every TeX and LaTeX reference you can find, including Knuth's TeXbook, Lamport's LaTeX: ADPS, and popular online guides. A change like this is sure to cause a mismatch in expectations for most folks: anybody who doesn't actively follow TeX development on the mailing lists (which I presume is most users) will likely be surprised and frustrated that this decades-old convention suddenly stops working after recent updates.
They are implemented as ligatures in Computer Modern fonts, so it is not unreasonable for them to not work with any other font that does not have such ligatures. XeTeX and LuaTeX providing ways to emulate this with other fonts is already going out of their way, since even Knuth's TeX does not do this (if the font does not have a ligature it will do nothing). Regards, Khaled
Am 28.02.19 um 11:45 schrieb Khaled Hosny:
They are implemented as ligatures in Computer Modern fonts, so it is not unreasonable for them to not work with any other font that does not have such ligatures. XeTeX and LuaTeX providing ways to emulate this with other fonts is already going out of their way, since even Knuth's TeX does not do this (if the font does not have a ligature it will do nothing).
true, but it is also correct that in the last 3 decades essentially any font usable with TeX (well LaTeX) had the ligatures, so the statement in the TeX book that --- is the way to get an emdash was/is true with virtually all fonts in use in pdftex. So it is not just a quirk of CM fonts, it is largely a feature of the ecosystem (or was in the days of Type 1 fonts) and as such it is a pity imho if it becomes a breaking change when processing a document with a unicode engine. frank
On Thu, Feb 28, 2019 at 11:57:01AM +0100, Frank Mittelbach wrote:
true, but it is also correct that in the last 3 decades essentially any font usable with TeX (well LaTeX) had the ligatures, so the statement in the TeX book that --- is the way to get an emdash was/is true with virtually all fonts in use in pdftex. So it is not just a quirk of CM fonts, it is largely a feature of the ecosystem (or was in the days of Type 1 fonts) and as such it is a pity imho if it becomes a breaking change when processing a document with a unicode engine.
But the engines didn’t change at all in that respect: the actual change that XeTeX and LuaTeX brought is that all of a sudden it became possible to use many more fonts than before, most of which didn’t have the ligatures that TeX users had come to expect. Arthur
They are implemented as ligatures in Computer Modern fonts, so it is not unreasonable for them to not work with any other font that does not have such ligatures.
Yes, and this is imho a problem of open type fonts. I do find it very difficult to handle sources which uses the real unicode chars, as there are barely distinguishable.
XeTeX and LuaTeX providing ways to emulate this with other fonts is already going out of their way,
I'm glad that the engines offer work-arounds around this missing feature in open type fonts. But beside this, it is not the point: the fontloader *has* an emulation and this emulation has clearly currently a bug. But there is hope: Marcel just sent a pull request which hopefully resolves the problem: https://github.com/u-fischer/luaotfload/pull/45 -- Mit freundlichen Grüßen Ulrike Fischer
They are implemented as ligatures in Computer Modern fonts, so it is not unreasonable for them to not work with any other font that does not have such ligatures.
A couple of points: 1. It seems reasonable to expect Unicode-aware engines (luatex and xetex) to make this substitution, especially with the Ligatures=TeX option given to fontspec. (Isn't this the entire point of that option?) 2. Regardless of that debate, my example in the previous email shows the bug in action with the default Latin Modern fonts. Surely those should perform the substitution.
But beside this, it is not the point: the fontloader *has* an emulation and this emulation has clearly currently a bug.
But there is hope: Marcel just sent a pull request which hopefully resolves the problem:
Great news!
On Thu, Feb 28, 2019 at 01:46:48PM -0800, Matt Kline wrote:
1. It seems reasonable to expect Unicode-aware engines (luatex and xetex) to make this substitution, especially with the Ligatures=TeX option given to fontspec. (Isn't this the entire point of that option?)
The point is for users to have the option, not necessarily that it should be on by default. Arthur
Am Fri, 1 Mar 2019 14:15:30 +0100 schrieb Arthur Reutenauer:
1. It seems reasonable to expect Unicode-aware engines (luatex and xetex) to make this substitution, especially with the Ligatures=TeX option given to fontspec. (Isn't this the entire point of that option?)
The point is for users to have the option, not necessarily that it should be on by default.
There are millions of tex-documents, bib-files and other tex related files around all using -- and --- to get an endash and an emdash. Why should we setup defaults that breaks this input as soon as people try to convert a document to use xelatex or lualatex? Which user would say that it is an improvement that an input which previously gave an endash now prints --? Do you want to answer all the questions on tex.sx about how to activate the tlig again? -- Ulrike Fischer https://www.troubleshooting-tex.de/
Ulrike, On Fri, Mar 01, 2019 at 03:59:54PM +0100, Ulrike Fischer wrote:
Why should we setup defaults that breaks this input as soon as people try to convert a document to use xelatex or lualatex?
You write “we”, as if this had been a conscious decision by a group of people. But you must know, as well as I do, that no such thing ever happened. At the very best one single person made some half-hearted decision at some point, when developing the engine, the format, or the package.
Which user would say that it is an improvement that an input which previously gave an endash now prints --?
If you think you can speak on behalf of users, please go ahead. I’m the vice president of the TeX Users Group and wouldn’t ever presume to wager what users would say. Most of what we do is based on blind guesses.
Do you want to answer all the questions on tex.sx about how to activate the tlig again?
No. By the way I can’t, because I don’t answer these questions in the first place :-) Best, Arthur
Am Fri, 1 Mar 2019 17:19:50 +0100 schrieb Arthur Reutenauer:
Why should we setup defaults that breaks this input as soon as people try to convert a document to use xelatex or lualatex?
You write “we”, as if this had been a conscious decision by a group of people.
Sure it was. Code like \def\UnicodeFontTeXLigatures{+tlig;} doesn't get added by accident to the latex format. Also I was even part of some of the discussion.
Which user would say that it is an improvement that an input which previously gave an endash now prints --?
If you think you can speak on behalf of users, please go ahead. I’m the vice president of the TeX Users Group and wouldn’t ever presume to wager what users would say. Most of what we do is based on blind guesses.
Well I have to wager what users would say. As member of the latex team and as maintainer of luaotfload I have to decide things. We can't leave the question which defaults should go in the format in some sort of Schrödingers cat state. But I'm quite confident that my decisions are better than blind guesses. I have really a lot experience with users and documents of all kind. -- Ulrike Fischer https://www.troubleshooting-tex.de/
On Fri, Mar 01, 2019 at 07:00:47PM +0100, Ulrike Fischer wrote:
Code like \def\UnicodeFontTeXLigatures{+tlig;} doesn't get added by accident to the latex format. Also I was even part of some of the discussion.
Then what are you complaining about? The format is behaving exactly as you expect it to be. You have the power to change it, and you’re confident that you made the right decision on behalf of users. Arthur
participants (6)
-
Arthur Reutenauer
-
Frank Mittelbach
-
Khaled Hosny
-
Matt Kline
-
Ulrike Fischer
-
Ulrike Fischer