decomposed and precomposde glyphs
When using input like the following with xetex then harfbuzz kicks in and one would always get the good looking precomposed U+1EA0 for the A and the decomposed B+U+0323 for the B. With context (and lualatex) one get a rather bad looking -- as the dot is misplaced -- output for the A^^^^0323 input and no output at all for ^^^^1e04. As the coverage for precomposed glyphs varies a lot across fonts this is rather a nuisance. Is there a way to get the fontloader (also the one used by luaotfload) to do a similar substituation as done by harfbuzz? \starttext \catcode`\^= 7 Ạ A^^^^0323 %decomposed input -> U+1EA0 with xetex Ạ ^^^^1ea0 %precomposed input Ḅ B^^^^0323 %decomposed input ^^^^1e04 %precomposed input -> B+U+0323 with xetex \stoptext (I added the ^^-notation to avoid problems with copy&paste) -- Ulrike Fischer http://www.troubleshooting-tex.de/
On 2/23/2017 11:58 AM, Ulrike Fischer wrote:
When using input like the following with xetex then harfbuzz kicks in and one would always get the good looking precomposed U+1EA0 for the A and the decomposed B+U+0323 for the B.
With context (and lualatex) one get a rather bad looking -- as the dot is misplaced -- output for the A^^^^0323 input and no output at all for ^^^^1e04.
As the coverage for precomposed glyphs varies a lot across fonts this is rather a nuisance. Is there a way to get the fontloader (also the one used by luaotfload) to do a similar substituation as done by harfbuzz?
\starttext \catcode`\^= 7 Ạ A^^^^0323 %decomposed input -> U+1EA0 with xetex
Ạ ^^^^1ea0 %precomposed input
Ḅ B^^^^0323 %decomposed input
^^^^1e04 %precomposed input -> B+U+0323 with xetex
\stoptext
(I added the ^^-notation to avoid problems with copy&paste)
In context one can enable a collapse (last week i found out that it was off by default). Anyway, there are two issues here: (1) one can in the input stream collapse the dot accent and the other character but of course an altered input might not be what one wants, for instance because fonts fonts not always provide a decompose (ccmp) or composed glyphs (2) one can mess at the node list level which has a potyential drawback that one cannot get a character (explicit \char) there without the danger of it being mangled so, whatever method one chooses, it has to be controlled because in tex all is about control (think of verbatim) I have a file here that implements a pseudo feature that does this kind of (trivial) magic and I can add that to the distribution. (btw, I suppose that xetex can disable that on demand as from your post I deduce that it's default behaviour and fighting defaults is a pain). Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
Am Thu, 23 Feb 2017 13:19:12 +0100 schrieb Hans Hagen:
On 2/23/2017 11:58 AM, Ulrike Fischer wrote:
When using input like the following with xetex then harfbuzz kicks in and one would always get the good looking precomposed U+1EA0 for the A and the decomposed B+U+0323 for the B.
With context (and lualatex) one get a rather bad looking -- as the dot is misplaced -- output for the A^^^^0323 input and no output at all for ^^^^1e04.
As the coverage for precomposed glyphs varies a lot across fonts this is rather a nuisance. Is there a way to get the fontloader (also the one used by luaotfload) to do a similar substituation as done by harfbuzz?
\starttext \catcode`\^= 7 Ạ A^^^^0323 %decomposed input -> U+1EA0 with xetex
Ạ ^^^^1ea0 %precomposed input
Ḅ B^^^^0323 %decomposed input
^^^^1e04 %precomposed input -> B+U+0323 with xetex
\stoptext
(I added the ^^-notation to avoid problems with copy&paste)
In context one can enable a collapse (last week i found out that it was off by default).
Anyway, there are two issues here:
(1) one can in the input stream collapse the dot accent and the other character but of course an altered input might not be what one wants,
No changing the input is imho not a solution as the document fonts can have different coverage of the glyphs. Whatever is done must be font dependant.
(btw, I suppose that xetex can disable that on demand as from your post I deduce that it's default behaviour and fighting defaults is a pain).
I don't think that one disable the behaviour in xetex, the internal harfbuzz library is doing it. There is imho no way to get a A+combining accent in a document. I agree that it would be neat to be able to disable it but on the whole: if I had only the choice between "the xetex-substituation" and the current luatex/context behaviour I would prefer the first. For normal documents it is preferable. Did you sent the second mail only for me for a reason or did you only forget to add the list? Imho this is interesting for others too. -- Ulrike Fischer http://www.troubleshooting-tex.de/
On 2/23/2017 1:35 PM, Ulrike Fischer wrote:
Did you sent the second mail only for me for a reason or did you only forget to add the list? Imho this is interesting for others too.
well, it had an attachment that you can test which is not meant for context (to which i'll add a similar collapse feature, off by default of course as an escape) .. if that kind of stuff makes it into the latex font code is up to others btw, i suppose most context enter composed glyphs anyway instead of separate thingies Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
Am Thu, 23 Feb 2017 14:08:54 +0100 schrieb Hans Hagen:
btw, i suppose most context enter composed glyphs anyway instead of separate thingies
But as my example (for the B with dot below) shows that this fails if the font hasn't the precomposed glyph. Also the problem is not so much to control the direct input but copy&paste. -- Ulrike Fischer http://www.troubleshooting-tex.de/
On 2/23/2017 3:05 PM, Ulrike Fischer wrote:
Am Thu, 23 Feb 2017 14:08:54 +0100 schrieb Hans Hagen:
btw, i suppose most context enter composed glyphs anyway instead of separate thingies
But as my example (for the B with dot below) shows that this fails if the font hasn't the precomposed glyph.
Also the problem is not so much to control the direct input but copy&paste.
collapsing only can work for direct chars (so not \char for which we need that pseudo feature) ... when you put this on the first line it should do better % directives="filters.utf.collapse=yes" \starttext ạ ị Ạ g̣ \stoptext should work (and it should be on by default which somehow it wasn't, probably disabled when i added some other input handling but no one noticed so far) ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
Am Thu, 23 Feb 2017 14:08:54 +0100 schrieb Hans Hagen:
Did you sent the second mail only for me for a reason or did you only forget to add the list? Imho this is interesting for others too.
well, it had an attachment that you can test which is not meant for context (to which i'll add a similar collapse feature, off by default of course as an escape) .. if that kind of stuff makes it into the latex font code is up to others
I looked at the code and it actually uses an idea that I had already tried. The problem I couldn't solve was do decompose a glyph. Looking at an context example it seems that context can do it. The B with dot below (U+1E04) ends as BU+0323 in the pdf. But how does context does it? It doesn't happen with a similar latex example. There the U+1E04 is simply missing. And why is the dot of the first B better placed than the second? \starttext \directlua { fonts.handlers.otf.addfeature { name = "compose", type = "ligature", data = { ["Ạ"]={ "A", "̣" }, ["Ḅ"]={ "B", "̣" }, }, } } \font\test={file:lmroman10-regular.otf:+compose;} \test Ḅ Ạ Ḅ %why are both B in the pdf??? \stoptext -- Ulrike Fischer http://www.troubleshooting-tex.de/
On 2/23/2017 4:12 PM, Ulrike Fischer wrote:
Am Thu, 23 Feb 2017 14:08:54 +0100 schrieb Hans Hagen:
Did you sent the second mail only for me for a reason or did you only forget to add the list? Imho this is interesting for others too.
well, it had an attachment that you can test which is not meant for context (to which i'll add a similar collapse feature, off by default of course as an escape) .. if that kind of stuff makes it into the latex font code is up to others
I looked at the code and it actually uses an idea that I had already tried. The problem I couldn't solve was do decompose a glyph. Looking at an context example it seems that context can do it. The B with dot below (U+1E04) ends as BU+0323 in the pdf. But how does context does it? It doesn't happen with a similar latex example. There the U+1E04 is simply missing.
And why is the dot of the first B better placed than the second?
\starttext \directlua { fonts.handlers.otf.addfeature { name = "compose", type = "ligature", data = { ["Ạ"]={ "A", "̣" }, ["Ḅ"]={ "B", "̣" }, }, } }
\font\test={file:lmroman10-regular.otf:+compose;}
\test
Ḅ Ạ Ḅ %why are both B in the pdf???
\stoptext
it's not a ligature but a multiple fonts.handlers.otf.addfeature { name = "decompose", type = "multiple", nocheck = true, -- new trick data = { ["Ḅ"] = { "Q", "̣" }, ["Ạ"] = { "X", "̣" }, }, } as with all features we check against the font so of the font has no Ḅ nothing happens and you won't see one either (as the font has no) i'll add the nocheck option (but of course one can expect side effects when a font has nothing relevant) ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
Am Thu, 23 Feb 2017 16:55:08 +0100 schrieb Hans Hagen:
it's not a ligature but a multiple
fonts.handlers.otf.addfeature { name = "decompose", type = "multiple", nocheck = true, -- new trick
I updated my context version and changed my luaotfload.conf so that it uses the context fontloader. Then the following plain tex document (and a similar latex document) works and gives the wanted output. BUT: If I uncomment the AU+0323 then I get a fatal error: texmf-var/luatex-cache/generic/fonts/otl/lmroman10-regular.luc)table={ [7684]={ 66, 803 }, } ! error: (linebreak): invalid list tail, probably missing glue ! ==> Fatal error occurred, no output PDF file produced!Drücken Sie eine beliebige Taste . . . (that's from the terminal output, the log doesn't show the "table=..." part). This "invalid list tail" is popping up now an then. Philip even found a version were context crashed: https://github.com/lualatex/luaotfload/issues/388 \input luaotfload.sty \directlua { fonts.handlers.otf.addfeature { name = "compose", type = "ligature", data = { ["Ạ"]={ "A", "̣" }, }, } } \directlua{ fonts.handlers.otf.addfeature { name = "decompose", type = "multiple", nocheck = true, data = { ["Ḅ"] = { "B", "̣" }, }, } } %\begin{document} \font\test={file:lmroman10-regular.otf:mode=node;+decompose;+compose;} \test Ḅ Ạ % Ạ %uncomment this to get a fatal error. Ḅ \bye With the standard fontloader of luaotfload there is no error but the output is not correct. -- Ulrike Fischer http://www.troubleshooting-tex.de/
On 2/23/2017 6:26 PM, Ulrike Fischer wrote:
Am Thu, 23 Feb 2017 16:55:08 +0100 schrieb Hans Hagen:
it's not a ligature but a multiple
fonts.handlers.otf.addfeature { name = "decompose", type = "multiple", nocheck = true, -- new trick
I updated my context version and changed my luaotfload.conf so that it uses the context fontloader. Then the following plain tex document (and a similar latex document) works and gives the wanted output.
BUT: If I uncomment the AU+0323 then I get a fatal error:
texmf-var/luatex-cache/generic/fonts/otl/lmroman10-regular.luc)table={ [7684]={ 66, 803 }, }
! error: (linebreak): invalid list tail, probably missing glue ! ==> Fatal error occurred, no output PDF file produced!Drücken Sie eine beliebige Taste . . .
(that's from the terminal output, the log doesn't show the "table=..." part).
This "invalid list tail" is popping up now an then. Philip even found a version were context crashed: https://github.com/lualatex/luaotfload/issues/388
\input luaotfload.sty
\directlua { fonts.handlers.otf.addfeature { name = "compose", type = "ligature", data = { ["Ạ"]={ "A", "̣" }, }, } } \directlua{ fonts.handlers.otf.addfeature { name = "decompose", type = "multiple", nocheck = true, data = { ["Ḅ"] = { "B", "̣" }, }, } } %\begin{document} \font\test={file:lmroman10-regular.otf:mode=node;+decompose;+compose;}
\test
Ḅ Ạ % Ạ %uncomment this to get a fatal error. Ḅ
\bye
With the standard fontloader of luaotfload there is no error but the output is not correct.
I think that you cannot drop the new context code in an old otfload, because (1) afaik otfload patches code, and (2) because the context code assumes luatex 1.0.3 at least. Philip send me some test files a while ago so these bugs might have been fixed. (There are no real fundamental changes, most is performance related and there have been improvements in discretionary handling. I have no clue if I added all relevant helpers to the generic code but normally Philip checks that.) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
Am Thu, 23 Feb 2017 18:41:28 +0100 schrieb Hans Hagen:
I think that you cannot drop the new context code in an old otfload, because (1) afaik otfload patches code,
Well not every fontloader version works, and it is always possible that a too new (or too old) context fontloader breaks, but it doesn't hurt to try
and (2) because the context code assumes luatex 1.0.3 at least.
Bingo. With luatex 1.0.4 it works and the fatal error is gone ;-) -- Ulrike Fischer http://www.troubleshooting-tex.de/
On Thu, Feb 23, 2017 at 6:26 PM, Ulrike Fischer
Am Thu, 23 Feb 2017 16:55:08 +0100 schrieb Hans Hagen:
it's not a ligature but a multiple
fonts.handlers.otf.addfeature { name = "decompose", type = "multiple", nocheck = true, -- new trick
I updated my context version and changed my luaotfload.conf so that it uses the context fontloader. Then the following plain tex document (and a similar latex document) works and gives the wanted output.
BUT: If I uncomment the AU+0323 then I get a fatal error:
texmf-var/luatex-cache/generic/fonts/otl/lmroman10-regular.luc)table={ [7684]={ 66, 803 }, }
! error: (linebreak): invalid list tail, probably missing glue ! ==> Fatal error occurred, no output PDF file produced!Drücken Sie eine beliebige Taste . . .
(that's from the terminal output, the log doesn't show the "table=..." part).
This "invalid list tail" is popping up now an then. Philip even found a version were context crashed: https://github.com/lualatex/luaotfload/issues/388
https://bitbucket.org/phg/lua-la-tex-tests/src/tip/context/cnt-luatex-2-cras... \starttext \def \feats {+kern;mode=base} %% “node” works \definefontfeature [crash] [kern=yes,mode=base] \definedfont [file:Iwona-Regular.otf*crash] participated \stoptext no crash here with luatex 1.0.4 on my local box (will commit soon to experimental) -- luigi
Am Thu, 23 Feb 2017 18:45:04 +0100 schrieb luigi scarso:
no crash here with luatex 1.0.4 on my local box
I just tried the lualatex examples with my luatex 1.0.4 too (I got it from w32tex.org) and the error seems to be gone. -- Ulrike Fischer http://www.troubleshooting-tex.de/
On Thu, Feb 23, 2017 at 6:50 PM, Ulrike Fischer
Am Thu, 23 Feb 2017 18:45:04 +0100 schrieb luigi scarso:
no crash here with luatex 1.0.4 on my local box
I just tried the lualatex examples with my luatex 1.0.4 too (I got it from w32tex.org) and the error seems to be gone. ah w32tex.. right. (offtopic) So... can you check if the png/svg backed of mplib in luatex is still available ?
-- luigi
On 2/23/2017 6:26 PM, Ulrike Fischer wrote:
With the standard fontloader of luaotfload there is no error but the output is not correct.
btw, plain tests can be done with mtxrun --script plain --make mtxrun --script plain yourfile (at least that is how Luigi and I test generic when plain crashes are reported) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
I looked at the code and it actually uses an idea that I had already tried. The problem I couldn't solve was do decompose a glyph. Looking at an context example it seems that context can do it. The B with dot below (U+1E04) ends as BU+0323 in the pdf. But how does context does it?
It uses the Unicode composition information (part of UnicodeData.txt), they’re made into a Lua table in ConTeXt (named char-def.lua, if it hasn’t changed). Best, Arthur
Am Thu, 23 Feb 2017 17:20:05 +0100 schrieb Arthur Reutenauer:
I looked at the code and it actually uses an idea that I had already tried. The problem I couldn't solve was do decompose a glyph. Looking at an context example it seems that context can do it. The B with dot below (U+1E04) ends as BU+0323 in the pdf. But how does context does it?
It uses the Unicode composition information (part of UnicodeData.txt), they’re made into a Lua table in ConTeXt (named char-def.lua, if it hasn’t changed).
I know of char-def.lua but the question was more *how* the information is used and *when*. In an input call back? Through a font feature? -- Ulrike Fischer http://www.troubleshooting-tex.de/
On 2/24/2017 10:35 AM, Ulrike Fischer wrote:
Am Thu, 23 Feb 2017 17:20:05 +0100 schrieb Arthur Reutenauer:
I looked at the code and it actually uses an idea that I had already tried. The problem I couldn't solve was do decompose a glyph. Looking at an context example it seems that context can do it. The B with dot below (U+1E04) ends as BU+0323 in the pdf. But how does context does it?
It uses the Unicode composition information (part of UnicodeData.txt), they’re made into a Lua table in ConTeXt (named char-def.lua, if it hasn’t changed).
I know of char-def.lua but the question was more *how* the information is used and *when*. In an input call back? Through a font feature?
in the case of the feature i sent you it's a font feature (so it operates on the stream of characters that the font machinery sees) but in context we can also do it in the input (several ways) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
participants (4)
-
Arthur Reutenauer
-
Hans Hagen
-
luigi scarso
-
Ulrike Fischer