On 1-1-2012 10:32, Paul Isambert wrote:
> Reinhard Kotucha<reinhard.kotucha at web.de>  a écrit:
>> Hi Taco,
>> though luatex isn't frozen yet, I think that many people are using it
>> already and drastic changes are not desirable.
>> There is one thing which could probably be improved without breaking
>> existing scripts, as far as I can see:
>>    string.explode(foo, ' +')
>> expects that ' ' is a space token (ASCII 0x20).  Is it possible to
>> change string.explode() so that ' ' can be either a space token (ASCII
>> 0x20) or a tabulator (ASCII 0x09) without breaking existing scripts?
> By definition, this would break those scripts that use string.explode()
> expecting spaces aren't tabs. Personally, I wouldn't mind if the function
> was modified in order to understand regular expressions, although that
> would quite clearly be incompatible with previous behavior.

in that case one could use the normal string matching function or lpeg 
... there is no need to burden luatex with large regexp libraries or 
other clever tricks .. also by interpreting space as either space or tab 
we end up with more of that (unbreakable space etc etc, unicode spacing) 
and it also defeats the purpose of the explode function: any added 
interpretation of the split pattern is one more argument for using 
regular lua string functions

local explode = function(s,p)
     local t = { }
     for s in gmatch(s,p) do
         if s ~= "" then
             t[#t+1] = s
     return t

local t = explode(str,"[^\t ]+")

works quite ok (and the space only variant is some 30% slower than 
explode but hardly measurable. Adding space interpretation to the built 
in explode function would make is slower.


