Assignments in TeX's mouth, or: "Dr. Zaius, it can talk!"
Hello, normally, TeX cannot perform assignments in the mouth (except for \let-ing a previously undefined control sequence to \relax when using \csname...\endcsname). With LuaTeX, this has changed, albeit in a slightly weird way: All assignments are \global. Plain example (with some variations on a theme): ---------------------------------- CUT ------------------------------ \tracingassigns1 % Access internal control sequences: \catcode`@=11 % For easier logging: \def\log#1{\immediate\write\sixt@@n{#1}} \let~\space \count@=5 \log{Before:~~~~~~~~~~~~~~~~~~~\the\count@} % We only define \test, we do not execute it: \begingroup \edef\test{% \directlua0{% % Note that `tex.count.count@' (syntactic sugar for the code % below) cannot be used, since `@' is not a letter and thus % not part of an identifier: tex.count["count@"]=42 % We create some contents for \test: tex.print("Contents of test") }% } % Behold: We changed \count@ in the mouth! \log{After:~~~~~~~~~~~~~~~~~~~~\the\count@} % Note that \test only contains the string passed to tex.print above. % The rest of the Lua code has been executed completely: \log{\string\test~is:~~~~~~~~~~~~~~~~~\meaning\test} \endgroup % Now that is weird: Assigning registers in Lua is automatically % global: \log{After \string\endgroup:~~~~~~~~~~\the\count@} % We try again, without \edef: \begingroup \directlua0{tex.count["count@"]=23} \log{In group, 2nd try:~~~~~~~~\the\count@} \endgroup % The assignment is global again: \log{After \string\endgroup, 2nd try:~\the\count@} % We try again, using accessor functions: \begingroup \directlua0{tex.setcount("count@",17)} \log{In group, 3rd try:~~~~~~~~\the\count@} \endgroup % And global again: \log{After \string\endgroup, 3rd try:~\the\count@} % We try again, using accessor functions and \{b,e}group: \bgroup \directlua0{tex.setcount("count@",37)} \log{In group, 3rd try:~~~~~~~~\the\count@} \egroup % And still global: \log{After \string\endgroup, 3rd try:~\the\count@} \bye ---------------------------------- CUT ------------------------------ Note: According to the Lua reference manual, section 2.1, the definition of `letter' depends on the current locale. Would it be possible/sensible to define the current locale not according to the system TeX is run on (which would actually make LuaTeX code locale-dependent, since some identifiers might not work on another computer with a different locale) according to TeX's current catcode table? And while we're at it: Should string.uppercase and string.lowercase use the \lccode/\uccode tables? When tracing assignments, the ones made from Lua are flagged as being \global (BTW: What are these messages about `reassigning [no_local_whatsits]'?): ---------------------------------- CUT ------------------------------ This is luaTeX, Version 3.141592-beta-0.10.2-2007081018 (Web2C 7.5.6) (format=luatex 2007.8.10) 11 AUG 2007 08:50 **MouthSideeffect (MouthSideeffect.tex{into \tracingassigns=1} {changing \log=macro:->\mathop {\rm log}\nolimits } {into \log=macro:#1->\immediate \write \sixt@@n {\ETC.} {changing ~=macro:->\penalty \@M \ } {into ~=macro:-> } {changing \count255=92} {into \count255=5} Before: 5 {reassigning [no_local_whatsits]=0} {reassigning [no_local_dirs]=0} {globally changing \count255=5} {into \count255=42} {changing \test=undefined} {into \test=macro:->Contents of test} After: 42 \test is: macro:->Contents of test After \endgroup: 42 {reassigning [no_local_whatsits]=0} {reassigning [no_local_dirs]=0} {globally changing \count255=42} {into \count255=23} In group, 2nd try: 23 After \endgroup, 2nd try: 23 {reassigning [no_local_whatsits]=0} {reassigning [no_local_dirs]=0} {globally changing \count255=23} {into \count255=17} In group, 3rd try: 17 After \endgroup, 3rd try: 17 {reassigning [no_local_whatsits]=0} {reassigning [no_local_dirs]=0} {globally changing \count255=17} {into \count255=37} In group, 3rd try: 37 After \endgroup, 3rd try: 37 ) No pages of output. ---------------------------------- CUT ------------------------------ I do not think that assignments to TeX's registers done in Lua code should be automatically \global, as it makes writing macros without (intentioned) side-effects much harder. And those macros are the best, since they can be used anywhere without having to remember that they clobber register \foo, redefine macro \bar et cetera. So, returning to this mail's subject, I can only say: "Take your hands off my grouping, you damn dirty Lua code!" ;-) Jonathan
Jonathan Sauer wrote:
I do not think that assignments to TeX's registers done in Lua code should be automatically \global, as it makes writing macros without (intentioned) side-effects much harder. And those macros are the best, since they can be used anywhere without having to remember that they clobber register \foo, redefine macro \bar et cetera.
since we have now a nearly unlimited amount of registers one can define dedicated counters for tasks and treat them global also withing the tex code
So, returning to this mail's subject, I can only say: "Take your hands off my grouping, you damn dirty Lua code!" ;-)
well, don't use tex.count then -) often the same can be accomplished by: tex.sprint("\\count123=",value) which will honor grouping, or a variant of this \count123=\directlua0{... tex.sprint(value) } introducing a grouping model in lua itself is messy; one has to live with the fact that both languages hav edifferent models; actually, once there is mplib, there is yet another grouping model -) ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hello!
I do not think that assignments to TeX's registers done in Lua code should be automatically \global, as it makes writing macros without (intentioned) side-effects much harder. And those macros are the best, since they can be used anywhere without having to remember that they clobber register \foo, redefine macro \bar et cetera.
since we have now a nearly unlimited amount of registers one can define dedicated counters for tasks and treat them global also withing the tex code
Well, this is not particularly elegant (admittedly a subjective opinion). Also, depending on the macro, it will not be reentrant. BTW: Why is it possible to have 268435456 (2^28) catcode tables? That strikes me, more than 65536 registers, as nearly unlimited. Why not "only" 65536, especially since one would likely run out of memory long before exhausting this amount?
So, returning to this mail's subject, I can only say: "Take your hands off my grouping, you damn dirty Lua code!" ;-)
well, don't use tex.count then -) often the same can be accomplished by:
tex.sprint("\\count123=",value)
But this is not expandable. Also, the assignment will be executed after the LuaTeX code, so other Lua code cannot use the register (it will use the old value). This creates a new kind of asynchronous execution, similar to TeX's mouth and stomach, only this time between Lua and TeX.
which will honor grouping, or a variant of this
\count123=\directlua0{... tex.sprint(value) }
introducing a grouping model in lua itself is messy; one has to live with the fact that both languages hav edifferent models; actually, once there is mplib, there is yet another grouping model -)
Why exactly is it messy? I would assume -- without having looked into the source -- that to set a register, a procedure is called with the register type (count, skip ...), the register number, the new value and a flag if the assignment should be global. This procedure then takes care of handling the grouping. Why cannot this procedure be called from Lua code (or the tex library, to be precise) as well? To leave the technical standpoint: When writing Lua code, the programmer uses Lua's scoping model. When accessing TeX's registers from Lua code, he/she/it uses TeX's grouping model, since they are part of TeX, not part of the Lua language. I do not think this is particularly messy. Still, if assignments using tex.count et.al. stay global, this should, IMO, be stated in the manual. Jonathan
Jonathan Sauer wrote:
BTW: Why is it possible to have 268435456 (2^28) catcode tables? That strikes me, more than 65536 registers, as nearly unlimited. Why not "only" 65536, especially since one would likely run out of memory long before exhausting this amount?
the same is true for the number of lua instances, one seldom needs more than a few, but we saw no reason for a limitation
So, returning to this mail's subject, I can only say: "Take your hands off my grouping, you damn dirty Lua code!" ;-) well, don't use tex.count then -) often the same can be accomplished by:
tex.sprint("\\count123=",value)
But this is not expandable. Also, the assignment will be executed after the LuaTeX code, so other Lua code cannot use the register (it will use the old value). This creates a new kind of asynchronous execution, similar to TeX's mouth and stomach, only this time between Lua and TeX.
which will honor grouping, or a variant of this
\count123=\directlua0{... tex.sprint(value) }
introducing a grouping model in lua itself is messy; one has to live with the fact that both languages hav edifferent models; actually, once there is mplib, there is yet another grouping model -)
Why exactly is it messy? I would assume -- without having looked into the source -- that to set a register, a procedure is called with the register type (count, skip ...), the register number, the new value and a flag if the assignment should be global. This procedure then takes care of handling the grouping. Why cannot this procedure be called from Lua code (or the tex library, to be precise) as well?
if i remember right, it was not as easy as that (is true for more tex internals); maybe in the long run a more sophisticated grouping model will surface, for instance we've been discussing assignments to registers that migrate after the the current group (handy for local calculations where the result has to be caried over) but this has a low priority (fonts, list manipulations has the highest)
To leave the technical standpoint: When writing Lua code, the programmer uses Lua's scoping model. When accessing TeX's registers from Lua code, he/she/it uses TeX's grouping model, since they are part of TeX, not part of the Lua language. I do not think this is particularly messy.
Still, if assignments using tex.count et.al. stay global, this should, IMO, be stated in the manual.
sure, i agree with that; and we can also add that it is currently a limitation that may be removed in future releases -) btw, a similar tricky area is in box manipulations, where one can construct node list and assign it to a box ... tex.box[0] = head_of_my_new_list if box[0] has content already, one has to make sure that this is freed properly, otherwise memory will remain allocated; so, for all interfaces certain 'rules of usage' apply. keep in mind that luatex is a multi-year project -) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hello,
BTW: Why is it possible to have 268435456 (2^28) catcode tables? That strikes me, more than 65536 registers, as nearly unlimited. Why not "only" 65536, especially since one would likely run out of memory long before exhausting this amount?
the same is true for the number of lua instances, one seldom needs more than a few, but we saw no reason for a limitation
That of course is true. Do you have any specific applications for catcode tables in mind?
[...] maybe in the long run a more sophisticated grouping model will surface, for instance we've been discussing assignments to registers that migrate after the the current group (handy for local calculations where the result has to be caried over) but this has a low priority (fonts, list manipulations has the highest)
I think this can be done in TeX right now (although it is a bit complicated). In <http://groups.google.com/group/comp.text.tex/browse_thread/thread/deb4f 30f546d7804/8954ab2efb7ab908>, a multi-token \aftergroup is described that could be used for this purpose. The \edef-based variation described in the second half would be a possible approach.
[...]
Jonathan
Jonathan Sauer wrote:
Hello,
BTW: Why is it possible to have 268435456 (2^28) catcode tables? That strikes me, more than 65536 registers, as nearly unlimited. Why not "only" 65536, especially since one would likely run out of memory long before exhausting this amount?
The question should be: why only 65536 registers instead of 2^28. The answer: there are not enough hours in a day. Best wishes, Taco
Jonathan Sauer wrote:
Hello,
BTW: Why is it possible to have 268435456 (2^28) catcode tables? That strikes me, more than 65536 registers, as nearly unlimited. Why not "only" 65536, especially since one would likely run out of memory long before exhausting this amount? the same is true for the number of lua instances, one seldom needs more than a few, but we saw no reason for a limitation
That of course is true.
Do you have any specific applications for catcode tables in mind?
in context mkiv we have a couple defined e.g. for verbatim Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Jonathan Sauer wrote:
Note: According to the Lua reference manual, section 2.1, the definition of `letter' depends on the current locale. Would it be
Quite right. For that reason, luatex forces LC_CTYPE=C LC_COLLATE=C LC_NUMERIC=C while it is executing, I just never documented that. I will add a paragraph to the manual.
possible/sensible to define the current locale not according to the system TeX is run on (which would actually make LuaTeX code locale-dependent, since some identifiers might not work on another computer with a different locale) according to TeX's current catcode table?
Category codes are quite often changed dynamically, so that could become a problem in real life documents rather quickly, I'm afraid. But I have not given this subject much thought so far, perhaps there is a way to do it.
And while we're at it: Should string.uppercase and string.lowercase use the \lccode/\uccode tables?
Most probably not, for a few reasons. One: in the traditional TeX, \lccode/\uccode is related to font encodings as well as input parsing. I am not quite ready yet to drop those side-effects, but writing more code on top of the existing stuff is not a smart idea right now. Two: those tables are dynamic just like \catcode, and that would mean adding context-sensitivity to a quite low-level routine. Three: (and this is by far the most important one) using string.uppercase and string.lowercase is not a good idea. It is much better to use the unicode-aware functions in the unicode library.
When tracing assignments, the ones made from Lua are flagged as being \global (BTW: What are these messages about `reassigning [no_local_whatsits]'?):
That is a bug, actually. It is currently using a non-initialized stack variable for the global/local decision, and I could make it do either. Best wishes, Taco
Hello,
Category codes are quite often changed dynamically, so that could become a problem in real life documents rather quickly, I'm afraid.
You are of course right. TeX avoids this problem by fixing a character's catcode when reading it; Lua would have to use not the current catcodetable, but the catcode of the characters it processes. Since Lua has no concept of catcodes, this would be impossible, at least without substancial changes to Lua.
And while we're at it: Should string.uppercase and string.lowercase use the \lccode/\uccode tables? [...] Three: (and this is by far the most important one) using string.uppercase and string.lowercase is not a good idea. It is much better to use the unicode-aware functions in the unicode library.
Then I think string.uppercase and string.lowercase should point to the corresponding function in the unicode library to prevent confusion and bugs. Or is there a reason to keep the old functions around?
When tracing assignments, the ones made from Lua are flagged as being \global (BTW: What are these messages about `reassigning [no_local_whatsits]'?):
That is a bug, actually. It is currently using a non-initialized stack variable for the global/local decision, and I could make it do either.
I do not quite understand which part of the above text you are adressing. Which one is a bug? The global assignments made from Lua, or the reassigning of [no_local_whatsits]? Inferring from Hans' mail, I would suspect the latter. Jonathan
Jonathan Sauer wrote:
And while we're at it: Should string.uppercase and string.lowercase use the \lccode/\uccode tables? [...] Three: (and this is by far the most important one) using string.uppercase and string.lowercase is not a good idea. It is much better to use the unicode-aware functions in the unicode library.
Then I think string.uppercase and string.lowercase should point to the corresponding function in the unicode library to prevent confusion and bugs. Or is there a reason to keep the old functions around?
Nothing other than that they come from the core lua distribution. We have been talking about implementing our own utf-8 aware string functions instead of the current mix of core lua 8bit and unicode extension library functions. But there has been no time to do anything about it yet.
When tracing assignments, the ones made from Lua are flagged as being \global (BTW: What are these messages about `reassigning [no_local_whatsits]'?): That is a bug, actually. It is currently using a non-initialized stack variable for the global/local decision, and I could make it do either.
I do not quite understand which part of the above text you are adressing. Which one is a bug? The global assignments made from Lua, or the reassigning of [no_local_whatsits]? Inferring from Hans' mail, I would suspect the latter.
Sorry for the confusion. The [no_local_whatsits] is an Aleph artifact that could be removed from the output, but is totally harmless. The use of global versus local assignment is totally random right now, and that is the bug I was talking about. Best wishes, Taco
Jonathan Sauer wrote:
Then I think string.uppercase and string.lowercase should point to the corresponding function in the unicode library to prevent confusion and bugs. Or is there a reason to keep the old functions around?
sure, because they are part of the official lua libraries and these are taken as is, of course you can redefine them yourself strings.lower = unicode.utf.lower etc
When tracing assignments, the ones made from Lua are flagged as being \global (BTW: What are these messages about `reassigning [no_local_whatsits]'?): That is a bug, actually. It is currently using a non-initialized stack variable for the global/local decision, and I could make it do either.
I do not quite understand which part of the above text you are adressing. Which one is a bug? The global assignments made from Lua, or the reassigning of [no_local_whatsits]? Inferring from Hans' mail, I would suspect the latter.
indeed Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
participants (3)
-
Hans Hagen
-
Jonathan Sauer
-
Taco Hoekwater