Hi Taco, though luatex isn't frozen yet, I think that many people are using it already and drastic changes are not desirable. There is one thing which could probably be improved without breaking existing scripts, as far as I can see: string.explode(foo, ' +') expects that ' ' is a space token (ASCII 0x20). Is it possible to change string.explode() so that ' ' can be either a space token (ASCII 0x20) or a tabulator (ASCII 0x09) without breaking existing scripts? Regards, Reinhard -- ---------------------------------------------------------------------------- Reinhard Kotucha Phone: +49-511-3373112 Marschnerstr. 25 D-30167 Hannover mailto:reinhard.kotucha@web.de ---------------------------------------------------------------------------- Microsoft isn't the answer. Microsoft is the question, and the answer is NO. ----------------------------------------------------------------------------
Reinhard Kotucha
Hi Taco, though luatex isn't frozen yet, I think that many people are using it already and drastic changes are not desirable.
There is one thing which could probably be improved without breaking existing scripts, as far as I can see:
string.explode(foo, ' +')
expects that ' ' is a space token (ASCII 0x20). Is it possible to change string.explode() so that ' ' can be either a space token (ASCII 0x20) or a tabulator (ASCII 0x09) without breaking existing scripts?
By definition, this would break those scripts that use string.explode() expecting spaces aren't tabs. Personally, I wouldn't mind if the function was modified in order to understand regular expressions, although that would quite clearly be incompatible with previous behavior. Best, Paul
On 1-1-2012 10:32, Paul Isambert wrote:
Reinhard Kotucha
a écrit: Hi Taco, though luatex isn't frozen yet, I think that many people are using it already and drastic changes are not desirable.
There is one thing which could probably be improved without breaking existing scripts, as far as I can see:
string.explode(foo, ' +')
expects that ' ' is a space token (ASCII 0x20). Is it possible to change string.explode() so that ' ' can be either a space token (ASCII 0x20) or a tabulator (ASCII 0x09) without breaking existing scripts?
By definition, this would break those scripts that use string.explode() expecting spaces aren't tabs. Personally, I wouldn't mind if the function was modified in order to understand regular expressions, although that would quite clearly be incompatible with previous behavior.
in that case one could use the normal string matching function or lpeg ... there is no need to burden luatex with large regexp libraries or other clever tricks .. also by interpreting space as either space or tab we end up with more of that (unbreakable space etc etc, unicode spacing) and it also defeats the purpose of the explode function: any added interpretation of the split pattern is one more argument for using regular lua string functions local explode = function(s,p) local t = { } for s in gmatch(s,p) do if s ~= "" then t[#t+1] = s end end return t end local t = explode(str,"[^\t ]+") works quite ok (and the space only variant is some 30% slower than explode but hardly measurable. Adding space interpretation to the built in explode function would make is slower. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On 2012-01-01 at 12:28:06 +0100, Hans Hagen wrote:
On 1-1-2012 10:32, Paul Isambert wrote:
Reinhard Kotucha
a écrit: Hi Taco, though luatex isn't frozen yet, I think that many people are using it already and drastic changes are not desirable.
There is one thing which could probably be improved without breaking existing scripts, as far as I can see:
string.explode(foo, ' +')
expects that ' ' is a space token (ASCII 0x20). Is it possible to change string.explode() so that ' ' can be either a space token (ASCII 0x20) or a tabulator (ASCII 0x09) without breaking existing scripts?
By definition, this would break those scripts that use string.explode() expecting spaces aren't tabs. Personally, I wouldn't mind if the function was modified in order to understand regular expressions, although that would quite clearly be incompatible with previous behavior.
in that case one could use the normal string matching function or lpeg ... there is no need to burden luatex with large regexp libraries or other clever tricks .. also by interpreting space as either space or tab we end up with more of that (unbreakable space etc etc, unicode spacing) and it also defeats the purpose of the explode function: any added interpretation of the split pattern is one more argument for using regular lua string functions
local explode = function(s,p) local t = { } for s in gmatch(s,p) do if s ~= "" then t[#t+1] = s end end return t end
local t = explode(str,"[^\t ]+")
works quite ok (and the space only variant is some 30% slower than explode but hardly measurable. Adding space interpretation to the built in explode function would make is slower.
Thank you Hans, this is even much better than what I asked for. Regards, Reinhard -- ---------------------------------------------------------------------------- Reinhard Kotucha Phone: +49-511-3373112 Marschnerstr. 25 D-30167 Hannover mailto:reinhard.kotucha@web.de ---------------------------------------------------------------------------- Microsoft isn't the answer. Microsoft is the question, and the answer is NO. ----------------------------------------------------------------------------
participants (3)
-
Hans Hagen
-
Paul Isambert
-
Reinhard Kotucha