[Dev-luatex] Very simple sample?

David Kastrup dak at gnu.org
Sat May 12 13:00:18 CEST 2007

David Kastrup <dak at gnu.org> writes:

> You are thinking too complicated.  \detokenize requires a
> brace-matched input anyway which is scanned using current catcodes, so
> you gain nothing at all by circumventing premature argument scanning.
> So you can just do
> \long\def\startlua#1\endlua{\directlua0\expandafter{\detokenize{#1}}}
> and that's it.

Actually, it is more efficient to use \unexpanded instead of
\detokenize here.  \unexpanded takes the token list and just passes it
to TeX's printer.  The printed rendition is then fed into Lua.

In contrast, \detokenize takes the token lists and transforms it into
a token list consisting just of character tokens by passing it through
TeX's printer and retokenizing the resulted string into character
tokens.  This converted token list is then passed to TeX's printer.

While playing around I noticed that \directlua is actually an
expansible construct.  Is its expansion empty or consists of the
tex.print output turned into tokens?  My guess was that, like \input,
its expansion is empty but it will switch the input stream (which,
among other things, implies that you can change TeX's tokenization
rules within the stream).

So I tried the following (the idea being to see at which point of time
the catcode assignment takes effect).

 luatex -ini --progname tex
This is luaTeX, Version 3.141592-snapshot-2007040322 (Web2C 7.5.6) (INITEX)
**\catcode`{1 \catcode`}2

*\directlua0{\unexpanded{tex.print"\\catcode`\\!=14 !\\junk"}}!\jill

! Too many }'s.
<*> }


So it would appear that the comment character takes effect
immediately.  Will it extend to the end of the line?

 luatex -ini --progname tex
This is luaTeX, Version 3.141592-snapshot-2007040322 (Web2C 7.5.6) (INITEX)
**\catcode`{1 \catcode`}2

*\directlua0{\unexpanded{tex.print"\\catcode`\\!=14 !\\junk"}}\jill
! Undefined control sequence.
<*> ...{tex.print"\\catcode`\\!=14 !\\junk"}}\jill


No.  Comments started inside of tex.print _stay_ inside of tex.print,
even though no \endlinechar treatment is done.  It is my guess that
the behavior would be the same for the \scantextokens sequence.

It turns out that control sequence names and comments stay confined
into each tex.print and parsing does not even extend into the material
from consecutive tex.print statements as far as tokenization and the
extent of comments is concerned.

While this might have been obvious to some, I find it interesting
enough to mention.  I also find I like this choice.

David Kastrup, Kriemhildstr. 15, 44793 Bochum

More information about the dev-luatex mailing list