David Kastrup
You are thinking too complicated. \detokenize requires a brace-matched input anyway which is scanned using current catcodes, so you gain nothing at all by circumventing premature argument scanning.
So you can just do \long\def\startlua#1\endlua{\directlua0\expandafter{\detokenize{#1}}} and that's it.
Actually, it is more efficient to use \unexpanded instead of \detokenize here. \unexpanded takes the token list and just passes it to TeX's printer. The printed rendition is then fed into Lua. In contrast, \detokenize takes the token lists and transforms it into a token list consisting just of character tokens by passing it through TeX's printer and retokenizing the resulted string into character tokens. This converted token list is then passed to TeX's printer. While playing around I noticed that \directlua is actually an expansible construct. Is its expansion empty or consists of the tex.print output turned into tokens? My guess was that, like \input, its expansion is empty but it will switch the input stream (which, among other things, implies that you can change TeX's tokenization rules within the stream). So I tried the following (the idea being to see at which point of time the catcode assignment takes effect). luatex -ini --progname tex This is luaTeX, Version 3.141592-snapshot-2007040322 (Web2C 7.5.6) (INITEX) **\catcode`{1 \catcode`}2 *\directlua0{\unexpanded{tex.print"\\catcode`\\!=14 !\\junk"}}!\jill *} ! Too many }'s. <*> } ? * So it would appear that the comment character takes effect immediately. Will it extend to the end of the line? luatex -ini --progname tex This is luaTeX, Version 3.141592-snapshot-2007040322 (Web2C 7.5.6) (INITEX) **\catcode`{1 \catcode`}2 *\directlua0{\unexpanded{tex.print"\\catcode`\\!=14 !\\junk"}}\jill ! Undefined control sequence. <*> ...{tex.print"\\catcode`\\!=14 !\\junk"}}\jill ? * No. Comments started inside of tex.print _stay_ inside of tex.print, even though no \endlinechar treatment is done. It is my guess that the behavior would be the same for the \scantextokens sequence. It turns out that control sequence names and comments stay confined into each tex.print and parsing does not even extend into the material from consecutive tex.print statements as far as tokenization and the extent of comments is concerned. While this might have been obvious to some, I find it interesting enough to mention. I also find I like this choice. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum