On 8/22/2021 1:17 AM, Robert wrote:
On 19.08.21 18:30, Hans Hagen wrote:
On 8/19/2021 5:00 PM, Robert wrote:
Hans has replied off-list, saying that this is basically expected behaviour and that checking for ligatures by means of \lastnodetype is inherently unreliable in luatex. In that case I would suggest to change the wording in the manual, which quite unequivocally claims the opposite:
| The \lastnodetype primitive is 𝜀-TEX compliant. The valid range is | still [−1, 15] and glyph nodes (formerly known as char nodes) have | number 0 while ligature nodes are mapped to 7. That way macro packages | can use the same symbolic names as in traditional 𝜀-TEX. (p.123)
this is correct .. it doesn't say that the nodelist is the same and in fact luatex does report the right node in etex speak ..
Hm, just saying, numbers are in the same range, but they actually may be totally different, is not really what I would call "compliant"...
from the (rest of the) manual it's clear that luatex (1) has different nodes, (2) has split the interwoven "read input, handle fonts, hyphenate when needed" approach and that (3) one can kick in font handler functions so, this etex lastnode command just looks back at the moment it is invoked and *that* is what you then get back (with luatex glyph node number changed into zero which in luatex actually is a hlist node)
in the case of luatex there is no ligature node because the nodelist isn't processed and even then it could as well be a disc node
Well yes, something comparable (I guess) happens in etex/pdftex: without the \relax after the ligature, they also just report a glyph node -- with the \relax, however, they do report a ligature (or disc) node. But with luatex it doesn't make a difference whether there's a \relax after the ligature or not. That's kind of the crux of my report, I suppose.
because, as said, the list is handled after it has been completely constructed ... (if you don't believe this, just compare the pdftex source with luatex source)
Also, luatex does get the node type right when the ligature is wrapped in a box first: \setbox0\hbox{--} \unhbox0 \the\lastnodetype % OK
So deep down luatex seems to know better...
compare that with
\setbox0\hbox{--\the\lastnodetype}
again, whole list read, then treatment (and that do be anything, even remove these -) you either 'immediately look at the last node (currently constructed list) or you look at it after the list has been 'typeset' (if lastnodetype could be negative you even got different results because then you get three hyphens in a row)
just don't assume that luatex, pdftex, xetex produce the same node lists
Not even if there's no opentype font involved? And just for the record, xetex does report the same as pdftex.
indeed, split read/hyphenate/lig/kerning (unless overloaded which can be done)
and don't assume that f + i is a ligature in each font (or script / language) either because it can as wel be some kerning between f (either or not substituted) and i (either or not substituted)
I have no idea why you would think that I assume that f+i is a ligature in every font (I don't), and furthermore, I have no idea what this has to do with \lastnodetype not returning the expected value (my example didn't even contain "fi").
because if you test for (etex) last node type and expect a char or ligature node type (you explicitly point to codes 0 and 7 being different in the engines) you cannot predict what gets out; glyph/ligature are subtypes in a node but not different nodes so 'looking at the last node type in order to see what one gets is unreliable wrt this detail: in luatex there is no guarantee that the lig subtype is set, so etex-number-7 quite often might not show up when you look at the end of an unboxed list)
in luatex when you want to mess around at that level you have to use a callback (or preprocess the input)
Not quite sure how preprocessing the /input/ could tell me whether a /font/ has a specific ligature. Also I'm a bit baffled that expecting a luatex command to be compatible with etex/pdftex (as per the manual) should be tantamount to "messing around".
if you want to know that you can best write a callback that looks at he list after it has gone through the font handler the messing around refers to 'looking at a specific node when processing input using lastnodetype and handling on that' btw, in general the only lastnodetypes that are sort of reliable are those testing for penalties, kerns glue (inserts and marks can travel, whatsits can be anything). (there are more differences like this: {} doesn't break a ligature for instance and reprocessing of an unboxed list can also have side effects, depending on what callbacks kick in; there are also subtle differences, some under mode control, wrt successive hyphens, because these are handled at a different time in luatex) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------