sortedkeys and compare in LMTX l-table.lua
On table.sortedkeys the page https://www.contextgarden.net/Table_manipulation says: "Returns a sorted array of all the keys in table t. The comparer used for sorting will treat everything as a string iff the compared values are of mixed type." According to l-table.lua, sortedkeys first checks whether the keys in the given table are (1) all strings, (2) all numbers, or (3): neither (1) or (2). In cases (1) and (2) table.sort is used with the default comparer (Lua's '<'). In case (3) table.sort is used with a custom compare function named 'compare' and defined in l-table.lua. According to the sentence from the garden cited above, 'compare' should return (something equivalent) to tostring(a) < tostring(b). However, it does something more complex that, e.g., makes 10 < "2" < 3 < 10 ('<' meaning infix 'compare' here). I.e. there are strict circles in the strict order. https://www.contextgarden.net/Table_manipulation goes on explaining that, "In general this means that you will be fine as long as you avoid stuffing indices along with number hashes together in t, lest the order will end up confused depending on what type an element is compared with in what order (cf. example [2])." This reflects the way 'compare' actually is at the moment. But it contradicts the previous sentence "The comparer used for sorting will treat everything as a string iff the compared values are of mixed type." If everything is treated as a string when neither (1) nor (2), then everything is well-defined and there is no confusion. Why is 'compare' not just (something equivalent to) tostring(a) < tostring(b), violating the sentence cited above? Are there any meaningful uses of the way 'compare' is now? I suspect that the intent was to create an order that is lexicographic and numeric at the same time. As far as I see it, it just is not possible the way it was tried: Lexicographically, 11<2. Numerically, 2<11. I see three ways out: (1) Disallow mixed types of keys in tables given to sortedkeys. (2) Always use lexicographic order in the case of mixed key types. (3) Combine the orders by an order-theoretic sum. One possiblity: Every value that is neither a number nor a string is smaller than every number. Every number is smaller than every string. Otherwise elements are compared as they are now. Another possiblity: Define an order over all types, define an order for every type. Then sort by type first, and in type second. (1) may not possible without breaking a lot. (2) has sort of a discontinuity: E.g., add one numeric key to a table with only strings, and the order changes globally. (3) contains well defined, very predictable solutions without discontinuities.
On 1/26/2020 1:17 AM, Sebastian Miele wrote:
On table.sortedkeys the page https://www.contextgarden.net/Table_manipulation says: "Returns a sorted array of all the keys in table t. The comparer used for sorting will treat everything as a string iff the compared values are of mixed type."
According to l-table.lua, sortedkeys first checks whether the keys in the given table are (1) all strings, (2) all numbers, or (3): neither (1) or (2). In cases (1) and (2) table.sort is used with the default comparer (Lua's '<'). In case (3) table.sort is used with a custom compare function named 'compare' and defined in l-table.lua.
According to the sentence from the garden cited above, 'compare' should return (something equivalent) to tostring(a) < tostring(b). However, it does something more complex that, e.g., makes 10 < "2" < 3 < 10 ('<' meaning infix 'compare' here). I.e. there are strict circles in the strict order.
https://www.contextgarden.net/Table_manipulation goes on explaining that, "In general this means that you will be fine as long as you avoid stuffing indices along with number hashes together in t, lest the order will end up confused depending on what type an element is compared with in what order (cf. example [2])."
This reflects the way 'compare' actually is at the moment. But it contradicts the previous sentence "The comparer used for sorting will treat everything as a string iff the compared values are of mixed type." If everything is treated as a string when neither (1) nor (2), then everything is well-defined and there is no confusion.
Why is 'compare' not just (something equivalent to) tostring(a) < tostring(b), violating the sentence cited above? Are there any meaningful uses of the way 'compare' is now?
I suspect that the intent was to create an order that is lexicographic and numeric at the same time. As far as I see it, it just is not possible the way it was tried: Lexicographically, 11<2. Numerically, 2<11.
I see three ways out: (1) Disallow mixed types of keys in tables given to sortedkeys. (2) Always use lexicographic order in the case of mixed key types. (3) Combine the orders by an order-theoretic sum. One possiblity: Every value that is neither a number nor a string is smaller than every number. Every number is smaller than every string. Otherwise elements are compared as they are now. Another possiblity: Define an order over all types, define an order for every type. Then sort by type first, and in type second.
(1) may not possible without breaking a lot. (2) has sort of a discontinuity: E.g., add one numeric key to a table with only strings, and the order changes globally. (3) contains well defined, very predictable solutions without discontinuities. Normally keys are not mixed but when they are this is the way it's done. It's been so for more than a decade so unlikely to change.
These sorters are not used when typesetting because there we also deal with languages and such. If's mostly a convenience feature that makes sure we can handles hashes in a predictable order when needed (because hashes have random order, even per run) and it also tries to be efficient in doing so. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
Hans Hagen
On 1/26/2020 1:17 AM, Sebastian Miele wrote:
On table.sortedkeys the page https://www.contextgarden.net/Table_manipulation says: "Returns a sorted array of all the keys in table t. The comparer used for sorting will treat everything as a string iff the compared values are of mixed type."
According to l-table.lua, sortedkeys first checks whether the keys in the given table are (1) all strings, (2) all numbers, or (3): neither (1) or (2). In cases (1) and (2) table.sort is used with the default comparer (Lua's '<'). In case (3) table.sort is used with a custom compare function named 'compare' and defined in l-table.lua.
According to the sentence from the garden cited above, 'compare' should return (something equivalent) to tostring(a) < tostring(b). However, it does something more complex that, e.g., makes 10 < "2" < 3 < 10 ('<' meaning infix 'compare' here). I.e. there are strict circles in the strict order.
https://www.contextgarden.net/Table_manipulation goes on explaining that, "In general this means that you will be fine as long as you avoid stuffing indices along with number hashes together in t, lest the order will end up confused depending on what type an element is compared with in what order (cf. example [2])."
This reflects the way 'compare' actually is at the moment. But it contradicts the previous sentence "The comparer used for sorting will treat everything as a string iff the compared values are of mixed type." If everything is treated as a string when neither (1) nor (2), then everything is well-defined and there is no confusion.
Why is 'compare' not just (something equivalent to) tostring(a) < tostring(b), violating the sentence cited above? Are there any meaningful uses of the way 'compare' is now?
I suspect that the intent was to create an order that is lexicographic and numeric at the same time. As far as I see it, it just is not possible the way it was tried: Lexicographically, 11<2. Numerically, 2<11.
I see three ways out: (1) Disallow mixed types of keys in tables given to sortedkeys. (2) Always use lexicographic order in the case of mixed key types. (3) Combine the orders by an order-theoretic sum. One possiblity: Every value that is neither a number nor a string is smaller than every number. Every number is smaller than every string. Otherwise elements are compared as they are now. Another possiblity: Define an order over all types, define an order for every type. Then sort by type first, and in type second.
(1) may not possible without breaking a lot. (2) has sort of a discontinuity: E.g., add one numeric key to a table with only strings, and the order changes globally. (3) contains well defined, very predictable solutions without discontinuities.
These sorters are not used when typesetting because there we also deal with languages and such.
That's good to know.
If's mostly a convenience feature that makes sure we can handles hashes in a predictable order when needed (because hashes have random order, even per run) and it also tries to be efficient in doing so.
From my current perspective, it is very questionable whether 'predictable' is something that I would attach to the compare function. Even if 'predictable' only means 'it did not yet throw an error in the face of values of mixed type, as Lua's < would', it may not be so in the future, because compare is no strict partial order (10 < "2" < 3 < 10), and therefore violates the specification of table.sort (and any linear sorting in general). And even if table.sort does never detect this violation, it will give different results on different permutations of the same sequence.
Moreover, e.g. the example for n,key in next,table.sortedkeys(mixed) do io.write(string.format("[%2i] [%2s] (t:%6s) -> “%s”\n", n, key, type(key), mixed[key])) end from https://www.contextgarden.net/Table_manipulation returns different results at least on different runs. If the only real purpose of compare is to make sure we can handle keys of mixed types, then no sort at all is most efficient, although it gives different results on different runs, as the current compare does, too. If compare is not used when typesetting, then efficiency may not be important, and something well-defined and total may be used.
Normally keys are not mixed but when they are this is the way it's done. It's been so for more than a decade so unlikely to change.
That's fine. I want to admit that I felt strong aversion when I first saw compare. "Linear sorting" by violating the preconditions of linear sorting strongly goes against my upbringing. But ConTeXt/LuaTeX/LuaMetaTeX lives, thrives, and has quite some history. I will meditate and learn to live with it and, after hereby having peeked your attitude on such issues, bother neither me nor you with something like that again. (Except it makes something explode.) All the best Sebastian
On 1/28/2020 1:10 AM, Sebastian Miele wrote:
Moreover, e.g. the example
for n,key in next,table.sortedkeys(mixed) do io.write(string.format("[%2i] [%2s] (t:%6s) -> “%s”\n", n, key, type(key), mixed[key])) end
from https://www.contextgarden.net/Table_manipulation returns different results at least on different runs.
you normally don't iterate over an array that way, compare: local t = { 11, "44", 22, "33" } for k, v in table.sortedhash(t) do print(k,v,type(k),type(v)) end local m = table.sortedkeys(t) for k, v in next, m do print(k,v,type(k),type(v)) end local m = table.sortedkeys(t) for k=1,#m do local v = t[k] print(k,v,type(k),type(v)) end Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
Hans Hagen
On 1/28/2020 1:10 AM, Sebastian Miele wrote:
Moreover, e.g. the example
for n,key in next,table.sortedkeys(mixed) do io.write(string.format("[%2i] [%2s] (t:%6s) -> “%s”\n", n, key, type(key), mixed[key])) end
from https://www.contextgarden.net/Table_manipulation returns different results at least on different runs.
you normally don't iterate over an array that way, compare:
local t = { 11, "44", 22, "33" }
This is an array. No mixed keys. I do not get the point.
for k, v in table.sortedhash(t) do print(k,v,type(k),type(v)) end
local m = table.sortedkeys(t)
for k, v in next, m do print(k,v,type(k),type(v)) end
local m = table.sortedkeys(t) for k=1,#m do local v = t[k] print(k,v,type(k),type(v)) end
With just integer keys they all give essentially the same and expected result. I tried to change t into local t = { [ 11 ] = 11 , ["44"] = "44" , [ 22 ] = 22 , ["33"] = "33" } in case that was what you meant. But in this example numeric and lexicographic order agree. So nothing unexpected here, too. The only noteworthy thing is that "for k, v in next, m" indeed traverses in order here, although, according to the Lua manual, "the order in which the indices are enumerated is not specified, even for numeric indices". https://www.contextgarden.net/Table_manipulation often uses next to iterate even over sorted results, which according to the Lua manual is not guaranteed to yield the expected (ordered) results (in the future). The end of the page mentions that ipairs is slower than next or numeric for's. The text on table.prepend mentions that after prepending, iteration via next indeed does not traverse in order any more, and a numeric for is appropriate. So, if https://www.contextgarden.net/Table_manipulation reflects the way it is done in ConTeXt, it seems to be customary to partly rely on Lua's (current) implementation details beyond Lua's specification. But this shall be fine with me, too. Should some future implementation of Lua change implementation details in a relevant way here, you almost certainly will notice it first, and quickly know what to do. :-)
participants (2)
-
Hans Hagen
-
Sebastian Miele