Okay, this is weird (and long), but it gets clearer near the end: The following test using lpeg to split a comma-separated values works perfectly (never mind that subpattern B does not do anything): ---------------------------------------------------------------------- % A: % \catcode`\:=11 % \def\FM:ifFileIncluded#1{\message{whatever}} % % \FM:ifFileIncluded{} \directlua0{\unexpanded{ whiteSpace = lpeg.S(" \t\n") splitComma = lpeg.P({ lpeg.Ct(lpeg.V("elem") * (lpeg.V("sep") * lpeg.V("elem"))^0), sep = lpeg.S(",{}"), elem = whiteSpace^0 * lpeg.C((1 - lpeg.V("sep"))^1) * whiteSpace^0, % B }) }} \def\splitComma#1{% \directlua0{% local s = '\luaescapestring{\unexpanded{#1}}' local t = lpeg.match(splitComma,s) for k,v in ipairs(t) do texio.write_nl('[' .. v .. ']') end }% } \splitComma{A, B, C, D, E, F} % `print' is not documented, but prints a compiled pattern's bytecode % to the console \directlua0{lpeg.print(splitComma)} \end ----------------------------------------------------------------------- The pattern created is: [1 = elem 2 = sep 3 = elem 4 = sep ] 00: call -> 2 01: jmp -> 51 02: opencapture table(n = 0) (0) 03: call -> 10 04: choice -> 8 (0) 05: call -> 41 06: call -> 10 07: partial_commit -> 5 08: closecapture close(n = 0) (0) 09: ret 10: span [(09-0a)(20)] 19: opencapture simple(n = 0) (0) 20: choice -> 23 (0) 21: call -> 41 22: failtwice 23: any * 1 24: choice -> 30 (0) 25: choice -> 28 (0) 26: call -> 41 27: failtwice 28: any * 1 29: partial_commit -> 25 30: closecapture close(n = 0) (0) 31: span [(09-0a)(20)] 40: ret 41: set [(2c)(7b)(7d)] 50: ret 51: end However, if I run this with my own format, the pattern is: [1 = elem 2 = sep 3 = elem 4 = sep ] 00: call -> 2 01: jmp -> 51 02: opencapture table(n = 0) (0) 03: call -> 10 04: choice -> 8 (0) 05: call -> 41 06: call -> 10 07: partial_commit -> 5 08: closecapture close(n = 0) (0) 09: ret 10: span [(0a)(20)(2e)(42)(45)(49)(4c-4d)(4f-50)(53)] 19: opencapture simple(n = 0) (0) 20: choice -> 23 (0) 21: call -> 41 22: failtwice 23: any * 1 24: choice -> 30 (0) 25: choice -> 28 (0) 26: call -> 41 27: failtwice 28: any * 1 29: partial_commit -> 25 30: closecapture close(n = 0) (0) 31: span [(0a)(20)(2e)(42)(45)(49)(4c-4d)(4f-50)(53)] 40: ret 41: set [(2c)(7b)(7d)] 50: ret 51: end So instead of checking for whitespace in instructions 10 and 31, input is checked against the character set "\n .BEILMOPS" (and splitting the list fails almost completely). I know what a "Beil" is (an axe), and I know what a "Mops" is (some kind of weird animal, a bit like a groundhog [think Bill Murray]), but what is a "Beilmops"? And a "dot Beilmops"? Is is written in C# or what? Now you'll say "yeah, sure, who cares what weird stuff that weird Jonathan does in that weird format of his", but: This only happens if a macro "\FM:ifFileIncluded" is defined or referenced. If this macro is called "\FM:ifFileLoadeded" (the same length) or "\FM:ifFileIlcluded" ("l" instead of "n"), the pattern is compiled correctly. "\FM:ifFileLncluded" works, too. But the moment a control sequence called "\FM:ifFileIncluded" is used (defined or referenced), the lpeg pattern contains that strange animal. I tried to use Lua state 1 instead of 0 to make sure there were no definitions that could create a side-effect, but the pattern remained the same. I tried to uncomment (A) in above PlainTeX code, but the pattern is still correct. So I first suspected some kind of overflow in TeX's hash table that only occurs when there already exist a lot of control sequences and one of them has a very specific name and thus hash value (this does not seem to be the case, though). I tried moving the definition of "\FM:ifFileIncluded" to the beginning of my format (right after setting the catcodes), but without success. I tried defining it in the PlainTeX format, but again to no avail. I tried removing all unnecessary files from my format, with the same result. More weirdness that I more or less accidentally stumbled upon: \directlua1{\unexpanded{lpeg.print(lpeg.S(" \t\n"))}} results in set [(0a)(20)(2e)(42)(45)(49)(4c-4d)(4f-50)(53)] which is wrong, but \directlua1{\detokenize{lpeg.print(lpeg.S(" \t\n"))}} results in set [(09-0a)(20)] which is correct. And the equivalent, but slightly longer \directlua1{lpeg.print(lpeg.S(" \string\t\string\n"))} again results in the correct set [(09-0a)(20)] And indeed: If I replace "\unexpanded" in the code example above by "\detokenize" (and remove the empty lines which for some reason result in "\par" when "\detokenize" is used, but not with "\unexpanded"), the pattern is compiled correctly and works as expected. The plot thickens. Let's look further; how about simply telling Lua to print the string " \t\n"? \directlua1{\detokenize{texio.write_nl("[ \t\n]")}} results in [ ] but \directlua1{\unexpanded{texio.write_nl("[ \t\n]")}} results in [ IMPOSSIBLE. ] Not so weird anymore: "IMPOSSIBLE." is printed by procedure "print_cs" in luatex.web if the control sequence's pointer is below "active_base", that is zero or negative, or >= the pointer to the undefined control sequence (at least as far as I understand it). And "IMPOSSIBLE." sorted and stripped of duplicates is ... ".BEILMOPS"! Also note that "\t" is defined in PlainTeX, but not in my format. If I define it at the beginning of the code example above, nothing changes. But if I define it before defining "\FM:ifFileIncluded", everything works as expected, and \directlua1{\unexpanded{texio.write_nl("[ \t\n]")}} results in [ ] Not "IMPOSSIBLE." anymore. Now: If the control passed to "print_cs" (or "tokenlist_to_cstring" in luatoken.c) is undefined, "IMPOSSIBLE." is printed. As "\t" is indeed undefined, this is completely expected. What's not expected, is that this only happens if the macro "\FM:ifFileIncluded" is not defined before "\t" is defined (if at all). And weird again: "\n" is defined by neither PlainTeX nor my format, but does not result in "IMPOSSIBLE.". Side note: \immediate\write16{\detokenize{[ \t\n]}} \immediate\write16{\unexpanded{[ \t\n]}} both correctly display "[ \t \n ]". So it seems that "\unexpanded" works as expected, but something else does not. And finally: If I say "\let\t\t" at the beginning of my format, everything works as well. So "\t" may well be undefined, as long as it is entered into TeX's hash table before "\FM:ifFileIncluded" is. Jonathan
Jonathan Sauer wrote:
Okay, this is weird (and long), but it gets clearer near the end: The pattern created is:
(this list is not really the place for indepth lua discussions -)
[1 = elem 2 = sep 3 = elem 4 = sep ] 00: call -> 2 01: jmp -> 51 02: opencapture table(n = 0) (0) 03: call -> 10 04: choice -> 8 (0) 05: call -> 41 06: call -> 10 07: partial_commit -> 5 08: closecapture close(n = 0) (0) 09: ret 10: span [(09-0a)(20)] 19: opencapture simple(n = 0) (0) 20: choice -> 23 (0) 21: call -> 41 22: failtwice 23: any * 1 24: choice -> 30 (0) 25: choice -> 28 (0) 26: call -> 41 27: failtwice 28: any * 1 29: partial_commit -> 25 30: closecapture close(n = 0) (0) 31: span [(09-0a)(20)] 40: ret 41: set [(2c)(7b)(7d)] 50: ret 51: end
However, if I run this with my own format, the pattern is: ^^^^^^^^^^
[1 = elem 2 = sep 3 = elem 4 = sep ] 00: call -> 2 01: jmp -> 51 02: opencapture table(n = 0) (0) 03: call -> 10 04: choice -> 8 (0) 05: call -> 41 06: call -> 10 07: partial_commit -> 5 08: closecapture close(n = 0) (0) 09: ret 10: span [(0a)(20)(2e)(42)(45)(49)(4c-4d)(4f-50)(53)] 19: opencapture simple(n = 0) (0) 20: choice -> 23 (0) 21: call -> 41 22: failtwice 23: any * 1 24: choice -> 30 (0) 25: choice -> 28 (0) 26: call -> 41 27: failtwice 28: any * 1 29: partial_commit -> 25 30: closecapture close(n = 0) (0) 31: span [(0a)(20)(2e)(42)(45)(49)(4c-4d)(4f-50)(53)] 40: ret 41: set [(2c)(7b)(7d)] 50: ret 51: end
when i run it in context i get: [A] [B] [C] [D] [E] [F][1 = elem 2 = sep 3 = elem 4 = sep ] 00: call -> 2 01: jmp -> 51 02: opencapture table(n = 0) (0) 03: call -> 10 04: choice -> 8 (0) 05: call -> 41 06: call -> 10 07: partial_commit -> 5 08: closecapture close(n = 0) (0) 09: ret 10: span [(09-0a)(20)] 19: opencapture simple(n = 0) (0) 20: choice -> 23 (0) 21: call -> 41 22: failtwice 23: any * 1 24: choice -> 30 (0) 25: choice -> 28 (0) 26: call -> 41 27: failtwice 28: any * 1 29: partial_commit -> 25 30: closecapture close(n = 0) (0) 31: span [(09-0a)(20)] 40: ret 41: set [(2c)(7b)(7d)] 50: ret 51: end so it looks like you have to track down what your format is doing ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Jonathan Sauer wrote:
Okay, this is weird (and long), but it gets clearer near the end:
Definately weird.
However, if I run this with my own format, the pattern is:
How does that work, running this example in your format? I saved your example as weird.tex (attached), but luatex -fmt=luaktex weird gives me: This is LuaTeX, Version snapshot-0.20.1-2007121015 (Web2C 7.5.6) (weird.tex ! Undefined control sequence. l.7 \directlua 0{\unexpanded{ ? x Best wishes, Taco
Hello,
However, if I run this with my own format, the pattern is:
How does that work, running this example in your format? I saved your example as weird.tex (attached), but
luatex -fmt=luaktex weird
gives me:
This is LuaTeX, Version snapshot-0.20.1-2007121015 (Web2C 7.5.6) (weird.tex ! Undefined control sequence. l.7 \directlua 0{\unexpanded{ ? x
I'm sorry, I forgot to note that for "weird.tex" to run with luaktex, all primitives have to be prefixed with ":": ------------------------------------------------------------------------ \:directlua0{\:unexpanded{ whiteSpace = lpeg.S(" \t\n") splitComma = lpeg.P({ lpeg.Ct(lpeg.V("elem") * (lpeg.V("sep") * lpeg.V("elem"))^0), sep = lpeg.S(",{}"), elem = whiteSpace^0 * lpeg.C((1 - lpeg.V("sep"))^1) * whiteSpace^0, % B }) }} \:def\splitComma#1{% \:directlua0{% local s = '\:luaescapestring{\:unexpanded{#1}}' local t = lpeg.match(splitComma,s) for k,v in ipairs(t) do texio.write_nl('[' .. v .. ']') end }% } \splitComma{A, B, C, D, E, F} % `print' is not documented, but prints a compiled pattern's bytecode % to the console \:directlua0{lpeg.print(splitComma)} ------------------------------------------------------------------------ This should work.
Best wishes, Taco
Jonathan
Jonathan Sauer wrote:
I'm sorry, I forgot to note that for "weird.tex" to run with luaktex, all primitives have to be prefixed with ":":
Ok, it runs fine now using your format. Better than expected, actually: with my latest local executable I get the correct output (no bug): [1 = elem 2 = sep 3 = elem 4 = sep ] 00: call -> 2 01: jmp -> 51 02: opencapture table(n = 0) (0) 03: call -> 10 04: choice -> 8 (0) 05: call -> 41 06: call -> 10 07: partial_commit -> 5 08: closecapture close(n = 0) (0) 09: ret 10: span [(09-0a)(20)] 19: opencapture simple(n = 0) (0) 20: choice -> 23 (0) 21: call -> 41 22: failtwice 23: any * 1 24: choice -> 30 (0) 25: choice -> 28 (0) 26: call -> 41 27: failtwice 28: any * 1 29: partial_commit -> 25 30: closecapture close(n = 0) (0) 31: span [(09-0a)(20)] 40: ret 41: set [(2c)(7b)(7d)] 50: ret 51: end Perhaps the IMPOSSSIBLE. error was just a side-effect of another bug that is already fixed (that is possible, and not even all that unlikely, as there were some "uninitialized memory" errors in 0.20.0) or possibly the behaviour depends on the used texmf.cnf and/or architecture (also possible, but those would be bad). Best wishes, Taco
I'm sorry, I forgot to note that for "weird.tex" to run with luaktex, all primitives have to be prefixed with ":":
Ok, it runs fine now using your format. Better than expected, actually: with my latest local executable I get the correct output (no bug): [...] Perhaps the IMPOSSSIBLE. error was just a side-effect of another bug that is already fixed (that is possible, and not even all that unlikely, as there were some "uninitialized memory" errors in 0.20.0) or
Hello, possibly
the behaviour depends on the used texmf.cnf and/or architecture (also possible, but those would be bad).
I'll try it again this evening using 0.20.1 (or should I use the current trunk?).
Best wishes, Taco
Jonathan
Hello,
Perhaps the IMPOSSSIBLE. error was just a side-effect of another bug that is already fixed (that is possible, and not even all that unlikely, as there were some "uninitialized memory" errors in 0.20.0) or possibly the behaviour depends on the used texmf.cnf and/or architecture (also possible, but those would be bad).
Well, the bad news is: The bug is still in the trunk as of yesterday morning/noon. The good news is: I narrowed down the problem to a simple format and a simple test file (attached): 1. Create the format `weird_format' using this command line: luatex --fmt=weird_format --ini --jobname=weird_format weird_format 2. Typeset 'weird_plain.tex' using this format: luatex --fmt=weird_format weird_plain 3. Observe the result: luatex --fmt=weird_format weird_plain This is LuaTeX, Version snapshot-0.20.1-2007121117 (Web2C 7.5.6) (weird_plain.tex [A][1 = elem 2 = sep 3 = elem 4 = sep ] 00: call -> 2 01: jmp -> 51 02: opencapture table(n = 0) (0) 03: call -> 10 04: choice -> 8 (0) 05: call -> 41 06: call -> 10 07: partial_commit -> 5 08: closecapture close(n = 0) (0) 09: ret 10: span [(0a)(20)(2e)(42)(45)(49)(4c-4d)(4f-50)(53)] 19: opencapture simple(n = 0) (0) 20: choice -> 23 (0) 21: call -> 41 22: failtwice 23: any * 1 24: choice -> 30 (0) 25: choice -> 28 (0) 26: call -> 41 27: failtwice 28: any * 1 29: partial_commit -> 25 30: closecapture close(n = 0) (0) 31: span [(0a)(20)(2e)(42)(45)(49)(4c-4d)(4f-50)(53)] 40: ret 41: set [(2c)(7b)(7d)] 50: ret 51: end (undefined) (macro:#1->) [0] ) Output written on weird_plain.dvi (1 page, 144 bytes). Transcript written on weird_plain.log 4. Change \unexpanded to \detokenize and observe: luatex --fmt=weird_format weird_plain This is LuaTeX, Version snapshot-0.20.1-2007121117 (Web2C 7.5.6) (weird_plain.tex [A] [B] [C] [D] [E] [F][1 = elem 2 = sep 3 = elem 4 = sep ] 00: call -> 2 01: jmp -> 51 02: opencapture table(n = 0) (0) 03: call -> 10 04: choice -> 8 (0) 05: call -> 41 06: call -> 10 07: partial_commit -> 5 08: closecapture close(n = 0) (0) 09: ret 10: span [(09-0a)(20)] 19: opencapture simple(n = 0) (0) 20: choice -> 23 (0) 21: call -> 41 22: failtwice 23: any * 1 24: choice -> 30 (0) 25: choice -> 28 (0) 26: call -> 41 27: failtwice 28: any * 1 29: partial_commit -> 25 30: closecapture close(n = 0) (0) 31: span [(09-0a)(20)] 40: ret 41: set [(2c)(7b)(7d)] 50: ret 51: end (undefined) (macro:#1->) [0] ) Output written on weird_plain.dvi (1 page, 144 bytes). Transcript written on weird_plain.log. 5. Wonder why a .dvi file is created, even though there is no output ;-) 6. Try running weird_initex.tex in IniTeX: luatex --ini --jobname=weird_initex weird_initex This is LuaTeX, Version snapshot-0.20.1-2007121117 (Web2C 7.5.6) (INITEX) (weird_initex.tex [A] [B] [C] [D] [E] [F][1 = elem 2 = sep 3 = elem 4 = sep ] 00: call -> 2 01: jmp -> 51 02: opencapture table(n = 0) (0) 03: call -> 10 04: choice -> 8 (0) 05: call -> 41 06: call -> 10 07: partial_commit -> 5 08: closecapture close(n = 0) (0) 09: ret 10: span [(09-0a)(20)] 19: opencapture simple(n = 0) (0) 20: choice -> 23 (0) 21: call -> 41 22: failtwice 23: any * 1 24: choice -> 30 (0) 25: choice -> 28 (0) 26: call -> 41 27: failtwice 28: any * 1 29: partial_commit -> 25 30: closecapture close(n = 0) (0) 31: span [(09-0a)(20)] 40: ret 41: set [(2c)(7b)(7d)] 50: ret 51: end (undefined) (macro::ifFileIncluded#1->:ifFileIncluded) [0] ) Output written on weird_initex.dvi (1 page, 144 bytes). Transcript written on weird_initex.log. 7. Play with the two commented 'introductions' of \t in weird_format.tex (since '\t' is only introduced, not defined there). So: The problem only manifests itself, if '\t' is not in TeX's cs hash table when '\FM:ifFileIncluded' is defined, and if '\FM:ifFileIncluded' is defined in the format. Hypothesis: A bug during format dumping/loading. @Taco: If you cannot reproduce the bug, maybe it is an issue with endianness: Intel processors use little endian, while PPCs (used in the old macs) use big endian.
Best wishes, Taco
Jonathan
Jonathan Sauer wrote: ... zip file ... when using lpeg an dmatching, keep in mind that \unexpanded has the side effect of introducing spaces \edef\oeps{\unexpanded{\t\n}} \meaning\oeps \edef\oeps{\string\t\string\n} \meaning\oeps indeed your weird file produces an empty page but that due to your macro ... the simple example \pdfoutput1 \pdfcompresslevel=0 \def\splitComma{% \directlua0{s=1}% } \splitComma also produces an empty page, just call \show\splitComma this has to do with the fact that you use tabs in your file and you have not handled tab in your format btw, if you change \unexpanded by \detokenize you get the desired result a simple test shows ... \directlua0{lpeg.print(lpeg.S(" \string\t\string\n"))} \directlua0{\unexpanded{lpeg.print(lpeg.S(" \t\n"))}} \directlua0{\detokenize{lpeg.print(lpeg.S(" \t\n"))}} 00: set [(09-0a)(20)] 09: end [] 00: set [(0a)(20)(2e)(42)(45)(49)(4c-4d)(4f-50)(53)] 09: end [] 00: set [(09-0a)(20)] 09: end a simpler test is: \directlua0{print("1 \string\t\string\n")} \directlua0{\unexpanded{print("2 \t\n")}} \directlua0{\detokenize{print("3 \t\n")}} this gives 1 2 IMPOSSIBLE. 3 a simple write as in \immediate\write16{1 \string\t\string\n} \immediate\write16{\unexpanded{2 \t\n}} \immediate\write16{\detokenize{3 \t\n}} 1 \t\n 2 \t \n 3 \t \n (so, your problem is unrelated to lpeg) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hello,
when using lpeg an dmatching, keep in mind that \unexpanded has the side effect of introducing spaces
Oh. I used \unexpanded, because sometime someone on this mailing list noted that it was faster than \detokenize.
this has to do with the fact that you use tabs in your file and you have not handled tab in your format
Ups. Right. Thanks! Note to self: Catcodes are evil.
btw, if you change \unexpanded by \detokenize you get the desired result
I know.
(so, your problem is unrelated to lpeg)
I know :-) I just stayed with the lpeg example for some reason.
Hans
Jonathan
"Jonathan Sauer"
when using lpeg an dmatching, keep in mind that \unexpanded has the side effect of introducing spaces
Oh. I used \unexpanded, because sometime someone on this mailing list noted that it was faster than \detokenize.
Should make no difference, I guess. \detokenize should introduce the same spaces, using the same algorithm.
btw, if you change \unexpanded by \detokenize you get the desired result
I know.
Oops. Why would that be? -- David Kastrup, Kriemhildstr. 15, 44793 Bochum
David Kastrup wrote:
"Jonathan Sauer"
writes: when using lpeg an dmatching, keep in mind that \unexpanded has the side effect of introducing spaces Oh. I used \unexpanded, because sometime someone on this mailing list noted that it was faster than \detokenize.
Should make no difference, I guess. \detokenize should introduce the same spaces, using the same algorithm.
btw, if you change \unexpanded by \detokenize you get the desired result I know.
Oops. Why would that be?
Well, it could be a bug, perhaps. :-)
Hello,
when using lpeg an dmatching, keep in mind that \unexpanded has the
side effect of introducing spaces
Oh. I used \unexpanded, because sometime someone on this mailing list noted that it was faster than \detokenize.
Should make no difference, I guess. \detokenize should introduce the same spaces, using the same algorithm.
Well, currently it makes a difference, three actually: 1. \unexpanded introduces "IMPOSSIBLE." 2. \unexpanded introduces spaces after control sequences. 3. \unexpanded ignores \par's, while \detokenize does not. So empty lines in \directlua are no problem with \unexpanded, but result in a (Lua) parse error when \detokenize is used. At least in the context of \directlua. Also, in terms of speed: \unexpanded only marks all tokens as unexpandable, while \detokenize creates new character tokens. So the latter should be somewhat slower.
btw, if you change \unexpanded by \detokenize you get the desired result
I know.
Oops. Why would that be?
Because of a bug? :-)
-- David Kastrup, Kriemhildstr. 15, 44793 Bochum
Jonathan
"Jonathan Sauer"
Hello,
when using lpeg an dmatching, keep in mind that \unexpanded has the
side effect of introducing spaces
Oh. I used \unexpanded, because sometime someone on this mailing list noted that it was faster than \detokenize.
Should make no difference, I guess. \detokenize should introduce the same spaces, using the same algorithm.
Well, currently it makes a difference, three actually:
1. \unexpanded introduces "IMPOSSIBLE."
Hm.
2. \unexpanded introduces spaces after control sequences.
Why wouldn't \detokenize do the same? Wait, it does: dak@lola:~$ etex This is pdfTeXk, Version 3.141592-1.40.3 (Web2C 7.5.6) %&-line parsing enabled. **\message{\detokenize{\a b}} entering extended mode \a b *\message{\unexpanded{\a b}} \a b *\message{\detokenize{\:x}} \:x *\message{\unexpanded{\:x}} \:x *\end No pages of output. Transcript written on texput.log.
3. \unexpanded ignores \par's, while \detokenize does not. So empty lines in \directlua are no problem with \unexpanded, but result in a (Lua) parse error when \detokenize is used.
At least in the context of \directlua. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I guess that this may be the real reason.
Also, in terms of speed: \unexpanded only marks all tokens as unexpandable, while \detokenize creates new character tokens. So the latter should be somewhat slower.
No question about that.
btw, if you change \unexpanded by \detokenize you get the desired result
I know.
Oops. Why would that be?
Because of a bug? :-)
I should think so. Not sure whether this behavioral difference is intended, but it certainly feels wrong to me. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum
Hello, [\unexpanded vs \detokenize]
Well, currently it makes a difference, three actually:
1. \unexpanded introduces "IMPOSSIBLE."
Hm.
Addition: Only in the very specific situation created by the format weird_format.tex (see my mail from 09:06 this morning).
2. \unexpanded introduces spaces after control sequences.
Why wouldn't \detokenize do the same? Wait, it does:
At least in the context of \directlua. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I guess that this may be the real reason.
I thought a bit about this, and I think that it is quite useful if \detokenize does not introduce spaces when used in \directlua, as opposed to \unexpanded, since sometimes (i.e. when used as a set in lpeg.S) spaces are completely undesired. Of course, in these situations \string could be used as an alternative. But since \detokenize is equivalent (at least as I understand it) to prefixing each token with \string, \detokenize should not introduce spaces. IMO.
btw, if you change \unexpanded by \detokenize you get the desired result
I know.
Oops. Why would that be?
Because of a bug? :-)
I should think so. Not sure whether this behavioral difference is intended, but it certainly feels wrong to me.
I think we are talking about slightly different things: You about the introduction of spaces by \unexpanded and \detokenize, me about the "IMPOSSIBLE.". The latter most certainly is a bug :-)
-- David Kastrup, Kriemhildstr. 15, 44793 Bochum
Jonathan
"Jonathan Sauer"
[\unexpanded vs \detokenize]
2. \unexpanded introduces spaces after control sequences.
Why wouldn't \detokenize do the same? Wait, it does:
At least in the context of \directlua. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I guess that this may be the real reason.
I thought a bit about this, and I think that it is quite useful if \detokenize does not introduce spaces when used in \directlua, as opposed to \unexpanded, since sometimes (i.e. when used as a set in lpeg.S) spaces are completely undesired.
Of course, in these situations \string could be used as an alternative. But since \detokenize is equivalent (at least as I understand it)
Your understanding is wrong. etex This is pdfTeXk, Version 3.141592-1.40.3 (Web2C 7.5.6) %&-line parsing enabled. **\message{\detokenize{\x}a} entering extended mode \x a *\message{\string\x a} \xa *\end No pages of output. Transcript written on texput.log.
to prefixing each token with \string, \detokenize should not introduce spaces. IMO.
I disagree. I don't think that \detokenize should behave differently within \directlua. Not least of all since "within" is very fuzzy to define when macro expansion is involved. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum
Jonathan Sauer wrote:
Hello,
[\unexpanded vs \detokenize]
Well, currently it makes a difference, three actually:
1. \unexpanded introduces "IMPOSSIBLE." Hm.
Addition: Only in the very specific situation created by the format weird_format.tex (see my mail from 09:06 this morning).
there is indeed something going on with \t; we're looking into it Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hi, Jonathan Sauer wrote:
5. Wonder why a .dvi file is created, even though there is no output ;-)
Let me answer this first, as this question is easy: there is an explicit tab character in the input file, here: \def\splitComma#1{% \directlua0{% ^^^^^^^^ Because the minimalistic format does not have plain's \catcode`\^^I=10, that ^^I gets category code 12, and so is typeset (in the \nullfont).
Hypothesis: A bug during format dumping/loading.
(Un)fortunately, I still get the correct output. But there are lots of valgrind warnings while dumping the format, so I will investigate those next. A bug during the format dumping seems most likely.
@Taco: If you cannot reproduce the bug, maybe it is an issue with endianness: Intel processors use little endian, while PPCs (used in the old macs) use big endian.
Wouldn't be the first one, either. Perhaps I should write a funding proposal for ppc hardware. Best wishes, Taco
Hi Jonathan, Jonathan Sauer wrote:
Okay, this is weird (and long), but it gets clearer near the end:
It turned out that this bug was related to having a non-zero value for hash_extra in texmf.cnf. The current svn trunk should run OK. Best wishes, Taco
Hello Taco,
Okay, this is weird (and long), but it gets clearer near the end:
It turned out that this bug was related to having a non-zero value for hash_extra in texmf.cnf. The current svn trunk should run OK.
Great! I will test it this evening.
Best wishes, Taco
Thanks, Jonathan
Hello,
Okay, this is weird (and long), but it gets clearer near the end:
It turned out that this bug was related to having a non-zero value for hash_extra in texmf.cnf. The current svn trunk should run OK.
To quote an old Tugboat column: "Hey, it works!" :-) However, there still is another little thing concerning \par inside \unexpanded and \detokenize inside \directlua. The following PlainTeX example: --------------------------------------------------------------------- \def\example{} \directlua0{texio.write_nl("1: \detokenize{\par}")} \directlua0{texio.write_nl("2: \unexpanded{\par}")} \directlua0{texio.write_nl("3: \par")} \directlua0{texio.write_nl("4: \luaescapestring{\detokenize{\par}}")} \directlua0{texio.write_nl("5: \luaescapestring{\unexpanded{\par}}")} \directlua0{texio.write_nl("6: \luaescapestring{\par}")} \immediate\write16{7: \detokenize{\par}} \immediate\write16{8: \unexpanded{\par}} \immediate\write16{9: \meaning\par} \directlua0{texio.write_nl("A: \detokenize{\if}")} \directlua0{texio.write_nl("B: \unexpanded{\if}")} \directlua0{texio.write_nl("C: \detokenize{\example}")} \directlua0{texio.write_nl("D: \unexpanded{\example}")} \end --------------------------------------------------------------------- results in: --------------------------------------------------------------------- luatex weird_par.tex This is LuaTeX, Version snapshot-0.20.1-2007121218 (Web2C 7.5.6) (weird_par.tex 1: par 2: 3: 4: \par 5: \par 6: \par 7: \par 8: \par 9: \par A: if B: if C: example D: example ) No pages of output. Transcript written on weird_par.log. --------------------------------------------------------------------- So the primitive \par is gobbled if inside \directlua unless is has been detokenized using \detokenize or escaped using \luaescapestring, but neither are the primitive \if nor the macro \example (the backslashes are gobbled, because Lua interprets `\p' as `p', `\i' as `i' and `\e' as `e' when inside a string [not documented, but an implementation artefact]). This only happens when inside \directlua, inside \write the result is the same. This means that if I use \directlua alone or with \unexpanded, I can have empty lines in the Lua source, since they are ignored. If I use \detokenize, however, they result in a \par and consequently a parse error (unless in a Lua string, then they result in an inserted "par "). Is this a bug? A feature? I attached my texmf.cnf ("user" overrides "base") in case this phenomenon is related to the configuration.
Best wishes, Taco
Jonathan
Jonathan Sauer wrote:
This means that if I use \directlua alone or with \unexpanded, I can have empty lines in the Lua source, since they are ignored. If I use \detokenize, however, they result in a \par and consequently a parse error (unless in a Lua string, then they result in an inserted "par ").
Is this a bug? A feature?
My current guess is 'feature'. The tokentostring function for \directlua has 'inhibit_par' turned on, so that you don't get those pesky \par's that confuse the lua parser. The output from \detokenize is simply passed on, because it is not a \par token. General remark: suck attempts to squeeze tex input to behave like lua source code are hard to debug and even harder to predict. It is much less confusing if you create either an environment like \startluacode .. \stopluacode with appropriate catcode changes, or if you put the lua code in a separate file and dofile('thefile'). Best wishes in any case, Taco
On Thu, Dec 13, 2007 at 02:40:07PM +0100, Taco Hoekwater wrote:
General remark: suck attempts to squeeze tex input to behave like lua source code are hard to debug and even harder to predict. It is much less confusing if you create either an environment like
\startluacode .. \stopluacode
with appropriate catcode changes,
And \endlinechar=10 \catcode10=12 to enable the Lua parser seeing multiple lines instead of one merged huge line.
or if you put the lua code in a separate file and dofile('thefile').
Or use Lua's module management and make a separate file as module:
module('foobar', package.seeall)
...
That get's loaded via
require('foobar')
Yours sincerely
Heiko
Heiko Oberdiek
On Thu, Dec 13, 2007 at 02:40:07PM +0100, Taco Hoekwater wrote:
General remark: suck attempts to squeeze tex input to behave like lua source code are hard to debug and even harder to predict. It is much less confusing if you create either an environment like
\startluacode .. \stopluacode
with appropriate catcode changes,
And \endlinechar=10 \catcode10=12 to enable the Lua parser seeing multiple lines instead of one merged huge line.
What's the beef with that? Except inside of strings (where \n will work fine), the Lua parser does not consider line endings to be different from spaces. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum
On Thu, Dec 13, 2007 at 06:18:24PM +0100, David Kastrup wrote:
Heiko Oberdiek
writes: On Thu, Dec 13, 2007 at 02:40:07PM +0100, Taco Hoekwater wrote:
General remark: suck attempts to squeeze tex input to behave like lua source code are hard to debug and even harder to predict. It is much less confusing if you create either an environment like
\startluacode .. \stopluacode
with appropriate catcode changes,
And \endlinechar=10 \catcode10=12 to enable the Lua parser seeing multiple lines instead of one merged huge line.
What's the beef with that? Except inside of strings (where \n will work fine), the Lua parser does not consider line endings to be different from spaces.
\catcode`\{=1
\catcode`\}=2
\directlua0{
do
local foo
error("Error in third line")
end
}
\endlinechar=10 %
\directlua0{%
do
local foo
error("Same line, but useful line number")
end
}%
\endlinechar=13 %
\end
luatex --ini test
This is LuaTeX, Version snapshot-0.20.0-2007120612 (Web2C 7.5.6) (INITEX)
(test.tex
! LuaTeX error [string "luas[0]"]:1: Error in third line.
l.8 }
?
! LuaTeX error [string "luas[0]"]:3: Same line, but useful line number.
l.15 }
%
?
)
No pages of output.
Transcript written on test.log.
Yours sincerely
Heiko
Heiko Oberdiek wrote:
\catcode`\{=1 \catcode`\}=2 \directlua0{ do local foo error("Error in third line") end } \endlinechar=10 %
Also: \directlua0{ do local foo -- not used error("File gives an error because the comment does not end") end } Best wishes, Taco
Taco Hoekwater
Heiko Oberdiek wrote:
\catcode`\{=1 \catcode`\}=2 \directlua0{ do local foo error("Error in third line") end } \endlinechar=10 %
Also:
\directlua0{ do local foo -- not used error("File gives an error because the comment does not end") end }
Well, it's LuaTeX, not Luaweb or so. So just use \directlua0{ do local foo % not used error("File gives an error because the comment does end") end } -- David Kastrup, Kriemhildstr. 15, 44793 Bochum
On Thu, Dec 13, 2007 at 08:42:05PM +0100, David Kastrup wrote:
Taco Hoekwater
writes: Heiko Oberdiek wrote:
\catcode`\{=1 \catcode`\}=2 \directlua0{ do local foo error("Error in third line") end } \endlinechar=10 %
Also:
\directlua0{ do local foo -- not used error("File gives an error because the comment does not end") end }
Well, it's LuaTeX, not Luaweb or so. So just use
\directlua0{ do local foo % not used error("File gives an error because the comment does end") end }
local s = "abc.def.ghi"
string.gsub(s, "%.", "/") % ups
;-)
Yours sincerely
Heiko
Well, it's LuaTeX, not Luaweb or so. So just use
\directlua0{ do local foo % not used error("File gives an error because the comment does end") end }
And you would have us write \directlua0{ function print_with_exclam(str) % Prints 'str' with an exclamation mark texio.write_nl(string.format("\string\%s!\string\n", str)) end print_with_exclam("Hello, world") } rather than \startluacode function print_with_exclam(str) -- Prints 'str' with an exclamation mark texio.write_nl(string.format("%s!\n", str)) end \stopluacode ? That's not very readable and extremely error-prone, I find. Arthur
participants (6)
-
Arthur Reutenauer
-
David Kastrup
-
Hans Hagen
-
Heiko Oberdiek
-
Jonathan Sauer
-
Taco Hoekwater