On 1-1-2012 10:32, Paul Isambert wrote:
Reinhard Kotucha
a écrit: Hi Taco, though luatex isn't frozen yet, I think that many people are using it already and drastic changes are not desirable.
There is one thing which could probably be improved without breaking existing scripts, as far as I can see:
string.explode(foo, ' +')
expects that ' ' is a space token (ASCII 0x20). Is it possible to change string.explode() so that ' ' can be either a space token (ASCII 0x20) or a tabulator (ASCII 0x09) without breaking existing scripts?
By definition, this would break those scripts that use string.explode() expecting spaces aren't tabs. Personally, I wouldn't mind if the function was modified in order to understand regular expressions, although that would quite clearly be incompatible with previous behavior.
in that case one could use the normal string matching function or lpeg ... there is no need to burden luatex with large regexp libraries or other clever tricks .. also by interpreting space as either space or tab we end up with more of that (unbreakable space etc etc, unicode spacing) and it also defeats the purpose of the explode function: any added interpretation of the split pattern is one more argument for using regular lua string functions local explode = function(s,p) local t = { } for s in gmatch(s,p) do if s ~= "" then t[#t+1] = s end end return t end local t = explode(str,"[^\t ]+") works quite ok (and the space only variant is some 30% slower than explode but hardly measurable. Adding space interpretation to the built in explode function would make is slower. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------