Hello,
when NO_DUMP_SHARE is not defined and the system is Little endian, then
for the possibility of sharing format files between architectures all
dumped multi-byte valules are byte swapped by the function "swap_items"
in "tex/texfileio.c".
The origin of the function is in Web2C's "texmfmp.c" and it seems that
it was added to LuaTeX in the version 1.08 when the source was converted
from CWeb (or am I reading the history wrong?).
However, there are two differences between Web2C's and LuaTeX's
"swap_items":
1) LuaTeX also supports 12 byte values.
and more importantly:
2) LuaTeX essentially does this:
- allocate temporary array
- copy input to temporary array
- [common code with web2c]
- copy temporary array back to input to serve as output
- free temporary array
This all seems redundant and causes many small allocations (~350K
allocations for a ~800K format file), because most allocations are of
only 4 bytes.
Curiously the gcc -O2 optimizer doesn't catch this even though it is a
static function (and changing xmalloc/xfree to the "intrinsic"
malloc/free doesn't help it). Maybe the possible unsigned int overflow
prevents the optimization? Or am I missing some side effect/purpose of
the copying/allocating?
See my proposal below.
(Note that in the LuaTeX repository --disable-dump-share is the default,
while it isn't in TeX Live, I think.)
Michal Vlasák
--- a/tex/texfileio.c
+++ b/tex/texfileio.c
@@ -1125,13 +1125,9 @@ static gzFile gz_fmtfile = NULL;
*/
-static void swap_items(char *pp, int nitems, int size)
+static void swap_items(char *p, int nitems, int size)
{
char temp;
- unsigned total = (unsigned) (nitems * size);
- char *q = xmalloc(total);
- char *p = q;
- memcpy(p,pp,total);
/*tex
Since `size' does not change, we can write a while loop for each case,
@@ -1201,8 +1197,6 @@ static void swap_items(char *pp, int nitems, int size)
default:
FATAL1("Can't swap a %d-byte item for (un)dumping", size);
}
- memcpy(pp,q,total);
- xfree(q);
}
#endif