[Dev-luatex] Extension language integration of LuaTeX and LilyPond

Bruno Le Floch blflatex at gmail.com
Tue May 7 16:01:18 CEST 2013

Hello David,

> comparing LuaTeX and LilyPond integration of their respective extension languages.
> [...]
> I've added some rough sketches at the end of the article that should
> make clear why this can't be done in formats alone but will require
> primitive support as well if things are supposed to turn out nicely.

I believe that some of the points you raise, and the syntax you
propose, could be obtained at the format level.  Below, I'm just
throwing ideas out, feel free to kill most of them.

For instance, it is possible to get the syntax \luadef parshape ...
\endluadef (i.e., replacing end by \endluadef in your example): just
read everything from \luadef to \endluadef with verbatim category

A side note: rather than using \noexpand in


you can use \unexpanded as


However, your point remains: the action of \unexpanded and \detokenize
(which are equivalent in this setting since the argument of
\directlua, once expanded, is turned to a string) depend on the
category codes in place when the tokens are converted to a string.  I
encountered a similar issue when writing a LaTeX package for regular
expressions: many regular expression constructs use escaped letters or
characters that can have special category codes for TeX.

One solution is to follow the footsteps of \verb, changing category
codes before reading the lua code.  If I remember correctly,
\begin{luacode}...\end{luacode} does this in LuaLaTeX, and LuaTeX
surely has a similar facility.  But this is not expandable.
Presumably, one could perform catcode changes expandably in
\directlua.  Either way, category code changes will encounter a big
problem: the user will write working code, then try to put it in a
macro, and fail, because the TeX parser will not know that one part of
the definition is meant to become Lua code.

A way out would be that LuaTeX's "eyes" recognize what part of the
code is TeX, and what part is Lua.  In fact, you allude to this
possibility when proposing a new catcode.  It is possible to achieve
this distinction while keeping the existing catcodes:

  % Here, normal TeX catcodes are in effect.
  % This is a comment, but we can do useful
  \message{things with #1 and #2.}
  #(-- This is a Lua comment, then code.
    function mess(x) tex.print( "\\message{argument = " .. x .. "}") end

Here, I've gone for using #( and #), i.e., a macro parameter character
(catcode 6) followed by a parenthesis, to switch between the TeX
interpreter and the Lua interpreter.  Then \foo{a}{b} displays two
messages: "things with a and b." and "argument = a".  This approach
may encounter difficulties with nested definitions, but should be ok
after some experimentation.

Feature request: rather than providing #(...#) directly at the engine
level, it may be better to add a callback for when TeX is reading a
macro definition and finds # followed by a non-digit, instead of
producing an error.  This callback could be used by package/format
writers to change category code tables on the fly from within the
definition.  The Lua code does not need to be interpreted, although it
may be more robust to at least tokenize it, to avoid finding #( #)
within Lua strings and interpreting them wrongly as switching back to

A completely different solution, that requires no change to the
engine, and is purely macro-based, is to convert TeX tokens to Lua
code with a loop that turns each token to a \string individually.
With the most naive macros, all spaces would need to be escaped, but
it is possible to improve those to only require spaces to be escaped
when following control sequences, or when TeX would ignore them (e.g.,
multiple spaces in a row, spaces at the beginning of lines).  The
example above would become

  \message{things with #1 and #2.}
    function\ mess(x)\
      tex.print("\\message{argument\ =\ "\ ..\ x\ ..\ "}")\
    mess(#(#1#)) }}

With slightly better macros, one can get rid of those unsightly "\ ",
unless they would be ignored by TeX or follow a control sequence
(remember that here we are converting actual TeX tokens into a

  \message{things with #1 and #2.}
    function mess(x)
      tex.print("\\message{argument = " .. x .. "}")
    mess(#(#1#)) }}

Best regards,

More information about the dev-luatex mailing list