Hello, when NO_DUMP_SHARE is not defined and the system is Little endian, then for the possibility of sharing format files between architectures all dumped multi-byte valules are byte swapped by the function "swap_items" in "tex/texfileio.c". The origin of the function is in Web2C's "texmfmp.c" and it seems that it was added to LuaTeX in the version 1.08 when the source was converted from CWeb (or am I reading the history wrong?). However, there are two differences between Web2C's and LuaTeX's "swap_items": 1) LuaTeX also supports 12 byte values. and more importantly: 2) LuaTeX essentially does this: - allocate temporary array - copy input to temporary array - [common code with web2c] - copy temporary array back to input to serve as output - free temporary array This all seems redundant and causes many small allocations (~350K allocations for a ~800K format file), because most allocations are of only 4 bytes. Curiously the gcc -O2 optimizer doesn't catch this even though it is a static function (and changing xmalloc/xfree to the "intrinsic" malloc/free doesn't help it). Maybe the possible unsigned int overflow prevents the optimization? Or am I missing some side effect/purpose of the copying/allocating? See my proposal below. (Note that in the LuaTeX repository --disable-dump-share is the default, while it isn't in TeX Live, I think.) Michal Vlasák --- a/tex/texfileio.c +++ b/tex/texfileio.c @@ -1125,13 +1125,9 @@ static gzFile gz_fmtfile = NULL; */ -static void swap_items(char *pp, int nitems, int size) +static void swap_items(char *p, int nitems, int size) { char temp; - unsigned total = (unsigned) (nitems * size); - char *q = xmalloc(total); - char *p = q; - memcpy(p,pp,total); /*tex Since `size' does not change, we can write a while loop for each case, @@ -1201,8 +1197,6 @@ static void swap_items(char *pp, int nitems, int size) default: FATAL1("Can't swap a %d-byte item for (un)dumping", size); } - memcpy(pp,q,total); - xfree(q); } #endif