final thoughts on experiments with lua
I've finished testing my lua script, and it does exactly what I need. I think I'll write a small article about it for one of the next context group proceedings, but wanted to just give a very brief summary that might be of interest to some: Pure Lua is wonderful, but as the language is deliberately kept very small, one sometimes has to find rather cumbersome workarounds. If you run Lua under ConTeXt (with mtxrun --script), you get a version with batteries included, which is perfect for manipulating and analyzing text. Some areas that I found particularly impressive: 1. I've tried several times to make use of lpegs. Hans' code is full of it, and I know that it's fast and extremely versatile, but I found it difficult to wrap my head around its functioning and write useful code with lpegs. The ConTeXt wrapper makes this extremely easy; this is lpeg for the rest of us. It's really a delight to build patterns and see them work immediately! 2. The utf library and string manipulation with characters.(...) is absolutely necessary if you want to handle non-ASCII text because pure Lua gives very unexpected results in this area. These operations work wonderfully with the context libraries. 3. Lua's handling of tables is very efficient and fast. For analyzing my Greek texts, I have to use huge tables for morphological parsing, with more than 900,000 entries. Looking up words in these tables is around 3x faster in Lua than in python! One final thought: one limitation that I still find cumbersome to work around is the fact that associative arrays ("pairs" in Lua speak) do not have an order. When I analyze my texts, I want book numbers, chapters, paragraphs preserved in the order in which they are read (entered into the table). In many cases, it is not possible (or extremely awkward) to sort these numbers, since chapters may be numbered something like 2, 2a, 3, 3α, 3β etc. python has the OrderedDict() in its collections module. In Lua, the best I could find was entering the chapter numbers into an array (ipair) and then retrieve it from there. Maybe there is a better way? All of this just to say how grateful I am for the way Hans, Taco, Wolfgang, Luigi, and the other developers have enhanced Lua. Thanks guys, you make my work much more pleasant and efficient! Thomas
On Wed, Jan 9, 2019 at 8:57 PM Thomas A. Schmitz
One final thought: one limitation that I still find cumbersome to work around is the fact that associative arrays ("pairs" in Lua speak) do not have an order. When I analyze my texts, I want book numbers, chapters, paragraphs preserved in the order in which they are read (entered into the table). In many cases, it is not possible (or extremely awkward) to sort these numbers, since chapters may be numbered something like 2, 2a, 3, 3α, 3β etc. python has the OrderedDict() in its collections module. In Lua, the best I could find was entering the chapter numbers into an array (ipair) and then retrieve it from there. Maybe there is a better way?
table.sort (list [, comp]) Sorts list elements in a given order, in-place, from list[1] to list[#list]. If comp is given, then it must be a function that receives two list elements and returns true when the first element must come before the second in the final order (so that, after the sort, i < j implies not comp(list[j],list[i])). If comp is not given, then the standard Lua operator < is used instead -- luigi
On Wed, 9 Jan 2019 20:57:21 +0100
"Thomas A. Schmitz"
3. Lua's handling of tables is very efficient and fast. For analyzing my Greek texts, I have to use huge tables for morphological parsing, with more than 900,000 entries. Looking up words in these tables is around 3x faster in Lua than in python!
I have found, in my limited use (and understanding) of lua for data analysis is that it is indeed VERY fast, and not only beats python without comparison, it also handles quite large data sets without choking. It has become fashionable to use hardware GPUs to speed-up parallel calculation tasks, and this is now often done using libraries having python bindings. I wonder if python becomes the limiting factor in those applications? Alan
On 1/9/2019 9:38 PM, Alan Braslau wrote:
On Wed, 9 Jan 2019 20:57:21 +0100 "Thomas A. Schmitz"
wrote: 3. Lua's handling of tables is very efficient and fast. For analyzing my Greek texts, I have to use huge tables for morphological parsing, with more than 900,000 entries. Looking up words in these tables is around 3x faster in Lua than in python!
I have found, in my limited use (and understanding) of lua for data analysis is that it is indeed VERY fast, and not only beats python without comparison, it also handles quite large data sets without choking.
and with a bit op optimization one can often squeeze out more
It has become fashionable to use hardware GPUs to speed-up parallel calculation tasks, and this is now often done using libraries having python bindings. I wonder if python becomes the limiting factor in those applications? Hans
----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
On 1/9/2019 8:57 PM, Thomas A. Schmitz wrote:
I've finished testing my lua script, and it does exactly what I need. I think I'll write a small article about it for one of the next context group proceedings, but wanted to just give a very brief summary that might be of interest to some:
Pure Lua is wonderful, but as the language is deliberately kept very small, one sometimes has to find rather cumbersome workarounds. If you run Lua under ConTeXt (with mtxrun --script), you get a version with batteries included, which is perfect for manipulating and analyzing text. Some areas that I found particularly impressive:
1. I've tried several times to make use of lpegs. Hans' code is full of it, and I know that it's fast and extremely versatile, but I found it difficult to wrap my head around its functioning and write useful code with lpegs. The ConTeXt wrapper makes this extremely easy; this is lpeg for the rest of us. It's really a delight to build patterns and see them work immediately!
2. The utf library and string manipulation with characters.(...) is absolutely necessary if you want to handle non-ASCII text because pure Lua gives very unexpected results in this area. These operations work wonderfully with the context libraries.
3. Lua's handling of tables is very efficient and fast. For analyzing my Greek texts, I have to use huge tables for morphological parsing, with more than 900,000 entries. Looking up words in these tables is around 3x faster in Lua than in python! > One final thought: one limitation that I still find cumbersome to work around is the fact that associative arrays ("pairs" in Lua speak) do not have an order. When I analyze my texts, I want book numbers, chapters, paragraphs preserved in the order in which they are read (entered into the table). In many cases, it is not possible (or extremely awkward) to sort these numbers, since chapters may be numbered something like 2, 2a, 3, 3α, 3β etc. python has the OrderedDict() in its collections module. In Lua, the best I could find was entering the chapter numbers into an array (ipair) and then retrieve it from there. Maybe there is a better way?
for k, v in table.sortedhash(t) do .... end or if it's sequential, you can add to an indexed (of use the index sorted for complex cases) it all depends on use ... if you can be more specific ...
All of this just to say how grateful I am for the way Hans, Taco, Wolfgang, Luigi, and the other developers have enhanced Lua. Thanks guys, you make my work much more pleasant and efficient! Thanks
----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
On 10. Jan 2019, at 01:08, Hans Hagen
wrote: it all depends on use ... if you can be more specific ...
Hans, Luigi, thanks for your hints on list sorting - they are appreciated, but I’ve been there many many times: it’s impossible to be more specific because numbering can be unexpectedly weird. Combinations of Greek and Roman letters, sometimes (for historic reasons) even lines that are out of numeric sequence. I’ve tried to catch these exceptions in sort functions, only to have to add even more ifs and buts when I was processing the next author. And I’m pretty sure that the solution is not in sorting a table index: the correct sequence is already in the source, it just has to be preserved. What I do now, in a nutshell: I have tables such as sections = { “1”, “2”, “2a” } words = { [“1”] = { “a”, “b” }, [“2a”] = { “c”, “d” } } so I can iterate through ipairs(sections) in sequence and pick up the word lists for each section. In the greater scheme of things, as Hraban pointed out: if there were an “ordered table” structure in Lua, this is precisely what it would do behind the scenes; it would just make it easier for the user. Best Thomas
On Thu, Jan 10, 2019 at 10:27 AM Schmitz Thomas A. < thomas.schmitz@uni-bonn.de> wrote:
On 10. Jan 2019, at 01:08, Hans Hagen
wrote: it all depends on use ... if you can be more specific ...
Hans, Luigi,
thanks for your hints on list sorting - they are appreciated, but I’ve been there many many times: it’s impossible to be more specific because numbering can be unexpectedly weird. Combinations of Greek and Roman letters, sometimes (for historic reasons) even lines that are out of numeric sequence. I’ve tried to catch these exceptions in sort functions, only to have to add even more ifs and buts when I was processing the next author. And I’m pretty sure that the solution is not in sorting a table index: the correct sequence is already in the source, it just has to be preserved. What I do now, in a nutshell: I have tables such as
sections = { “1”, “2”, “2a” }
words = { [“1”] = { “a”, “b” }, [“2a”] = { “c”, “d” } }
so I can iterate through ipairs(sections) in sequence and pick up the word lists for each section. In the greater scheme of things, as Hraban pointed out: if there were an “ordered table” structure in Lua, this is precisely what it would do behind the scenes; it would just make it easier for the user.
the point is that I believe that is also doable in lua... maybe could be helpful to have a significative example in python, ton see if we can mimic it in lua ? -- luigi
Am 2019-01-10 um 10:50 schrieb luigi scarso
sections = { “1”, “2”, “2a” }
words = { [“1”] = { “a”, “b” }, [“2a”] = { “c”, “d” } }
so I can iterate through ipairs(sections) in sequence and pick up the word lists for each section. In the greater scheme of things, as Hraban pointed out: if there were an “ordered table” structure in Lua, this is precisely what it would do behind the scenes; it would just make it easier for the user.
the point is that I believe that is also doable in lua... maybe could be helpful to have a significative example in python, ton see if we can mimic it in lua ?
The "minimal example" in Python is a collections.OrderedDict. It’s not about ordering the entries (anew), but keeping the order, i.e. retrieving the entries in the same order as you added them. If you iterate over a Python dict or a Lua pairs table, the order can be arbitrary. Greetlings, Hraban --- https://www.fiee.net http://wiki.contextgarden.net https://www.dreiviertelhaus.de GPG Key ID 1C9B22FD
On 1/10/2019 12:11 PM, Henning Hraban Ramm wrote:
Am 2019-01-10 um 10:50 schrieb luigi scarso
: sections = { “1”, “2”, “2a” }
words = { [“1”] = { “a”, “b” }, [“2a”] = { “c”, “d” } }
so I can iterate through ipairs(sections) in sequence and pick up the word lists for each section. In the greater scheme of things, as Hraban pointed out: if there were an “ordered table” structure in Lua, this is precisely what it would do behind the scenes; it would just make it easier for the user.
the point is that I believe that is also doable in lua... maybe could be helpful to have a significative example in python, ton see if we can mimic it in lua ?
The "minimal example" in Python is a collections.OrderedDict. It’s not about ordering the entries (anew), but keeping the order, i.e. retrieving the entries in the same order as you added them. If you iterate over a Python dict or a Lua pairs table, the order can be arbitrary.
I'll add this: local t = table.orderedhash() t["1"] = { "a", "b" } t["2"] = { } t["2a"] = { "a", "c", "d" } for k, v in table.ordered(t) do print(k) inspect(v) end which gives 1 table={ "a", "b", } 2 table={ } 3 table={ "a", "c", "d", } ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
On Thu, Jan 10, 2019 at 12:11 PM Henning Hraban Ramm
a Lua pairs table, the order can be arbitrary.
sure , the *default* __pairs gives pseudo-arbitrary order, but you can always use metatable: -- -- test.lua -- local _c,_t=0,{} local t = {} setmetatable(t, { __newindex = function(tbl, k, v) _c=_c+1 _t[_c]=k rawset(tbl, k, v) end, __pairs = function(tbl) local i = 1 return function(tbl,k) k = _t[i] i=i+1 return k,tbl[k] end, tbl,nil end, }) t['a']= 'aaa' t['3a']= '3aa' t['b']= 'baa' t['2b']= '2ba' t['c']= 'caa' t['2c']= '2ca' t['d']= 'daa' t['2d']= '2da' t['e']= 'eaa' t['2e']= '2ea' for k,v in pairs(t) do print(k,v) end $ mtxrun --script test.lua a aaa 3a 3aa b baa 2b 2ba c caa 2c 2ca d daa 2d 2da e eaa 2e 2ea -- luigi
Am 2019-01-09 um 20:57 schrieb Thomas A. Schmitz
I've finished testing my lua script, and it does exactly what I need. I think I'll write a small article about it for one of the next context group proceedings,
Hi Thomas, since I’m just starting to work on the current CG journal (sorry, life happens...), I might fit your article in. And a German version would be welcome for DANTE’s DTK.
3. Lua's handling of tables is very efficient and fast. For analyzing my Greek texts, I have to use huge tables for morphological parsing, with more than 900,000 entries. Looking up words in these tables is around 3x faster in Lua than in python!
Nice to know. Maybe I should try to convert some of my Python code (esp. the ConTeXt related scripts)...
One final thought: one limitation that I still find cumbersome to work around is the fact that associative arrays ("pairs" in Lua speak) do not have an order. When I analyze my texts, I want book numbers, chapters, paragraphs preserved in the order in which they are read (entered into the table). In many cases, it is not possible (or extremely awkward) to sort these numbers, since chapters may be numbered something like 2, 2a, 3, 3α, 3β etc. python has the OrderedDict() in its collections module. In Lua, the best I could find was entering the chapter numbers into an array (ipair) and then retrieve it from there. Maybe there is a better way?
Since Python’s usual dict is also unordered, I guess OrderedDict also just uses an index array under the hood. Python makes it easy to create new classes (also data types) that behave like something known. Don’t know how that would look in Lua... Python’s batteries are much bigger than Lua(TeX)’s. Of course Lua should be kept small, but there are always things missing. My Lua library for invoices with ConTeXt e.g. uses an object model from an obscure Lua library. Probably it would make more sense to use Lua’s tables in a native way than to insist on objects. I also missed a good GUI library (with widgets); I chose tekui and added some GUI-from-configuration stuff, but got stuck, since I’m comfortable with my CLI scripts anyway... Greetlings, Hraban --- https://www.fiee.net http://wiki.contextgarden.net https://www.dreiviertelhaus.de GPG Key ID 1C9B22FD
participants (6)
-
Alan Braslau
-
Hans Hagen
-
Henning Hraban Ramm
-
luigi scarso
-
Schmitz Thomas A.
-
Thomas A. Schmitz