[NTG-pdftex] Deterministic PDFs (switch to disable addition of timestamps and random ID nonces)
Markus.Kuhn at cl.cam.ac.uk
Fri Jun 26 13:03:46 CEST 2015
On 6/26/2015 3:40 AM, Heiko Oberdiek wrote:
> pdfTeX calculates the non-deterministic ID values in "utils.c", function
> "printID". It uses the MD5 sum of the following
> data for the ID values:
> * the current time by calling function "time" (resolution is second),
> * the current working directory by calling "getcwd" and
> * the output file name.
The trailer dictionary content is defined on page 43 (Table 15) of
In addition, Section 14.4 (page 551) suggests an MD5 input string, similar
to the list Heiko gave above, to determine the ID value:
• The current time
• A string representation of the file’s location, usually a pathname
• The size of the file in bytes
• The values of all entries in the file’s document information dictionary
But what exactly should or should not be fed into the ID-generating
hash function surely depends on workflow requirements. Some may
want the time in there, others now. Some may want the entire
source file in there, others not.
How about adding a new primitive that takes as input the string that
pdfTeX will fed into MD5 in order to generate the files identifier?
Then the user could override the above default choice, e.g.
along the lines of
if I wanted the ID to be calculated based on the date, pathname
and content of the source file, for example. I could then make
the ID depend on whatever strings TeX has access to. In particular,
I could also use
to make it a constant, or
to make it only depend on the filename, etc.
Markus Kuhn, Computer Laboratory, University of Cambridge
http://www.cl.cam.ac.uk/~mgk25/ || CB3 0FD, Great Britain
More information about the ntg-pdftex