[dev-context] sortedkeys and compare in LMTX l-table.lua

Sebastian Miele sebastian.miele at gmail.com
Tue Jan 28 01:10:35 CET 2020

Hans Hagen <j.hagen at xs4all.nl> writes:

> On 1/26/2020 1:17 AM, Sebastian Miele wrote:
> > On table.sortedkeys the page
> > https://www.contextgarden.net/Table_manipulation says: "Returns a sorted
> > array of all the keys in table t. The comparer used for sorting will
> > treat everything as a string iff the compared values are of mixed type."
> >
> > According to l-table.lua, sortedkeys first checks whether the keys in
> > the given table are (1) all strings, (2) all numbers, or (3): neither
> > (1) or (2). In cases (1) and (2) table.sort is used with the default
> > comparer (Lua's '<'). In case (3) table.sort is used with a custom
> > compare function named 'compare' and defined in l-table.lua.
> >
> > According to the sentence from the garden cited above, 'compare' should
> > return (something equivalent) to tostring(a) < tostring(b). However, it
> > does something more complex that, e.g., makes 10 < "2" < 3 < 10 ('<'
> > meaning infix 'compare' here). I.e. there are strict circles in the
> > strict order.
> >
> > https://www.contextgarden.net/Table_manipulation goes on explaining
> > that, "In general this means that you will be fine as long as you avoid
> > stuffing indices along with number hashes together in t, lest the order
> > will end up confused depending on what type an element is compared with
> > in what order (cf. example [2])."
> >
> > This reflects the way 'compare' actually is at the moment. But it
> > contradicts the previous sentence "The comparer used for sorting will
> > treat everything as a string iff the compared values are of mixed type."
> > If everything is treated as a string when neither (1) nor (2), then
> > everything is well-defined and there is no confusion.
> >
> > Why is 'compare' not just (something equivalent to) tostring(a) <
> > tostring(b), violating the sentence cited above? Are there any
> > meaningful uses of the way 'compare' is now?
> >
> > I suspect that the intent was to create an order that is lexicographic
> > and numeric at the same time. As far as I see it, it just is not
> > possible the way it was tried: Lexicographically, 11<2. Numerically,
> > 2<11.
> >
> > I see three ways out: (1) Disallow mixed types of keys in tables given
> > to sortedkeys. (2) Always use lexicographic order in the case of mixed
> > key types. (3) Combine the orders by an order-theoretic sum. One
> > possiblity: Every value that is neither a number nor a string is smaller
> > than every number. Every number is smaller than every string. Otherwise
> > elements are compared as they are now. Another possiblity: Define an
> > order over all types, define an order for every type. Then sort by type
> > first, and in type second.
> >
> > (1) may not possible without breaking a lot. (2) has sort of a
> > discontinuity: E.g., add one numeric key to a table with only strings,
> > and the order changes globally. (3) contains well defined, very
> > predictable solutions without discontinuities.
> These sorters are not used when typesetting because there we also deal
> with languages and such.

That's good to know.

> If's mostly a convenience feature that makes sure we can handles
> hashes in a predictable order when needed (because hashes have random
> order, even per run) and it also tries to be efficient in doing so.

>From my current perspective, it is very questionable whether
'predictable' is something that I would attach to the compare function.
Even if 'predictable' only means 'it did not yet throw an error in the
face of values of mixed type, as Lua's < would', it may not be so in the
future, because compare is no strict partial order (10 < "2" < 3 < 10),
and therefore violates the specification of table.sort (and any linear
sorting in general). And even if table.sort does never detect this
violation, it will give different results on different permutations of
the same sequence.

Moreover, e.g. the example

  for n,key in next,table.sortedkeys(mixed) do
     io.write(string.format("[%2i] [%2s] (t:%6s) -> ā€œ%sā€\n", n, key, type(key), mixed[key]))

from https://www.contextgarden.net/Table_manipulation returns different
results at least on different runs.

If the only real purpose of compare is to make sure we can handle keys
of mixed types, then no sort at all is most efficient, although it gives
different results on different runs, as the current compare does, too.

If compare is not used when typesetting, then efficiency may not be
important, and something well-defined and total may be used.

> Normally keys are not mixed but when they are this is the way it's
> done. It's been so for more than a decade so unlikely to change.

That's fine.

I want to admit that I felt strong aversion when I first saw compare.
"Linear sorting" by violating the preconditions of linear sorting
strongly goes against my upbringing.

But ConTeXt/LuaTeX/LuaMetaTeX lives, thrives, and has quite some
history. I will meditate and learn to live with it and, after hereby
having peeked your attitude on such issues, bother neither me nor you
with something like that again. (Except it makes something explode.)

All the best

More information about the dev-context mailing list