Hello David,
comparing LuaTeX and LilyPond integration of their respective extension languages. [...] I've added some rough sketches at the end of the article that should make clear why this can't be done in formats alone but will require primitive support as well if things are supposed to turn out nicely.
I believe that some of the points you raise, and the syntax you propose, could be obtained at the format level. Below, I'm just throwing ideas out, feel free to kill most of them. For instance, it is possible to get the syntax \luadef parshape ... \endluadef (i.e., replacing end by \endluadef in your example): just read everything from \luadef to \endluadef with verbatim category codes. A side note: rather than using \noexpand in \directlua{tex.print("\noexpand\\message{Hi}")} you can use \unexpanded as \def\nexplua#1{\directlua{\unexpanded{#1}}} \nexplua{tex.print("\\message{Hi}")} However, your point remains: the action of \unexpanded and \detokenize (which are equivalent in this setting since the argument of \directlua, once expanded, is turned to a string) depend on the category codes in place when the tokens are converted to a string. I encountered a similar issue when writing a LaTeX package for regular expressions: many regular expression constructs use escaped letters or characters that can have special category codes for TeX. One solution is to follow the footsteps of \verb, changing category codes before reading the lua code. If I remember correctly, \begin{luacode}...\end{luacode} does this in LuaLaTeX, and LuaTeX surely has a similar facility. But this is not expandable. Presumably, one could perform catcode changes expandably in \directlua. Either way, category code changes will encounter a big problem: the user will write working code, then try to put it in a macro, and fail, because the TeX parser will not know that one part of the definition is meant to become Lua code. A way out would be that LuaTeX's "eyes" recognize what part of the code is TeX, and what part is Lua. In fact, you allude to this possibility when proposing a new catcode. It is possible to achieve this distinction while keeping the existing catcodes: \def\foo#1#2{% % % Here, normal TeX catcodes are in effect. % This is a comment, but we can do useful \message{things with #1 and #2.} % #(-- This is a Lua comment, then code. function mess(x) tex.print( "\\message{argument = " .. x .. "}") end mess(#(#1#)) #)% } Here, I've gone for using #( and #), i.e., a macro parameter character (catcode 6) followed by a parenthesis, to switch between the TeX interpreter and the Lua interpreter. Then \foo{a}{b} displays two messages: "things with a and b." and "argument = a". This approach may encounter difficulties with nested definitions, but should be ok after some experimentation. Feature request: rather than providing #(...#) directly at the engine level, it may be better to add a callback for when TeX is reading a macro definition and finds # followed by a non-digit, instead of producing an error. This callback could be used by package/format writers to change category code tables on the fly from within the definition. The Lua code does not need to be interpreted, although it may be more robust to at least tokenize it, to avoid finding #( #) within Lua strings and interpreting them wrongly as switching back to TeX. A completely different solution, that requires no change to the engine, and is purely macro-based, is to convert TeX tokens to Lua code with a loop that turns each token to a \string individually. With the most naive macros, all spaces would need to be escaped, but it is possible to improve those to only require spaces to be escaped when following control sequences, or when TeX would ignore them (e.g., multiple spaces in a row, spaces at the beginning of lines). The example above would become \def\foo#1#2{% \message{things with #1 and #2.} \lua{ function\ mess(x)\ tex.print("\\message{argument\ =\ "\ ..\ x\ ..\ "}")\ end\ mess(#(#1#)) }} With slightly better macros, one can get rid of those unsightly "\ ", unless they would be ignored by TeX or follow a control sequence (remember that here we are converting actual TeX tokens into a string). \def\foo#1#2{% \message{things with #1 and #2.} \lua{ function mess(x) tex.print("\\message{argument = " .. x .. "}") end mess(#(#1#)) }} -- Best regards, Bruno