Re: [Dev-luatex] Dump sharing - redundant copying/allocations

21 Jul 2021

      On 7/21/2021 1:58 PM, Michal Vlasák wrote:
...
On Wed Jul 21, 2021 at 9:09 AM CEST, Hans Hagen wrote:
...
just afew remarks:
- dump sharing in luatex makes no sense, also because lua byte code
can be stored and that is not portable .. for that reason byte
swapping was removed at some later point in the project
Funnily enough, I looked at the code exactly because of this. As it
turns out, in TeX Live after recent problems with format sharing across
32-bit Windows and 64-bit Linux, there is now an effort to ensure the
portability of formats:
https://git.texlive.info/texlive/tree/Master/tlpkg/bin/tl-check-fmtshare
Of course the issue of unportable bytecode came up. As far as I know
three LuaTeX formats store it in format files: ConTeXt (wasn't and
probably won't be checked by the script and users alike), OpTeX and
minim (recent format, not even genereted in TeX Live).
context always managed its own format generation also because we operate 
on an engine axis too and one never calls context by its format stub
...
I evaluated the possibility of byte swapping in the Lua (un)dumping, by
introducing a patch in TeX Live, but I don't think it is worth:
- I personally wouldn't encourage _more_ format sharing between OS's
    and architectures in the future.
  - The types used by Lua (long long, double, int) may not even be
    portable anyways.
  - The "right" approach for portable dumping doesn't fit the current
    architecture.
    (https://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html)
even if one would handle bytes in the lua bytecode, the bytecode itself 
is not portable (and i'm not even sure how luajit stuff fits in because 
luajit is even more platform specific
...
...
- byte swapping introduces overhead and was happening for the majority
of users (intel), but the code was/is still there
I agree that if something the more common Little endian would have been
a better choice.
the sharing made sense in the time when one ran tex from a dvd in which 
case all binaries shared the same precooked formats (or on networkshares 
servingb multiple architectures) but afaik running from dvd was dropped 
and hardly anyone runs multiple platforms from one share (those who do 
can probably figure out some trick)
...
...
- as you mention, the allocation overhead is small, which is
definitely true compared to byte swapping and all the decompression
calls in between (loading the file database at startup probably takes
more time, and definitely in the early days was quite noticeable, more
than format loading); the level 3 compression gav ethe best trade-off
Interestingly I didn't find any really measurable slowdown from the byte
swapping, possibly because the entire function was turned into a lot of
SIMD instructions. (But to be fair I didn't test with a huge format, and
usually compile LuaTeX with the byte swapping disabled anyways.)
it is (or at least) was slower with the native microsoft compiler (win 
32 bins from akira) because that compiler is less agressive in some 
optimzations (we could deduce it plays safe in some areas of memory 
casting especially combined with the gz decompression)
...
...
- the mentioned 'internal compiler error' mentioned by Taco rings a
bell, it's a reason why often saving/loading an 'int' goes via a
variable because compilers would optimize in a way that dumping
variables (ints) in more complex data structures gave issues
- there are a few more places where using redundant temp vars are used
because during some operations memory can grow which can makes
pointers already set invalid, some of those have been sorted out
differently in the meantime)
Very interesting, hopefully the situation improved since.
...
...
- talking of performance, one of the interesting things in the
beginning of development was that we noticed different (incremental)
versions to perform differently; for instance when math was opened up
the machinery became real slow, as if we crossed some boundary,
(compilation order of specific code modules mattered too); but when i
then updated my laptop it was fast again, not so much because of fhe
faster cpu but because the cpu cache was larger; compiler optimization
also kind of interfered (at that time ideas, experiments and binaries
came and went on a daily basis, we had quite some fun)
- if performance is of concern, we also noticed (later one when luajit
enteres the scenary) that the settings for lua hashing matters, and
that luajit had pretty bad heuristics (tuned for url, we published
about that) so we used a different hashing there ...
Thank you for your insights and caring about performance!
one observation is that using macros instead of functions for
i think there are still a few places

performance makes little sense in a program like tex where one jumps 
over memory space all the time (compilers are quite okay in optimizing), 
but there can be differences between versions of e.g. gcc

in general, loss of performance in a tex engine is more due to the way 
macros are composed (or user styles for that matter)

another one is the performance of the console, i.e. kind of font, 
buffer, refresh delays defaults (i noticed that linux has large delays 
so that's the fastest, the new windows terminal is also fast) .. now 
that one is really measureable .. just try to run with piping the log to 
a file (all understandable) .. squeezing microseconds out of the binary 
can easily be nilled that way

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------