Validation problem with PDF files produced by pdftex
Hi folks, There is a recurring problem with PDF files produced by pdftex in France. Whenever PhD thesis documents are submitted to the French national archives, they have to be validated by some automated process which relies on pdftk (apparently). And files compiled by pdftex are rejected. The problem is described here : https://groups.google.com/forum/#!topic/comp.text.pdf/ZnobgbiiDZ4 French pdftex users will have an eternal gratitude to whoever will fix this :-) Best regards, -- Fabrice Popineau ----------------------------- CentraleSupelec Département Informatique 3, rue Joliot Curie 91192 Gif/Yvette Cedex Tel direct : +33 (0) 169851950 Standard : +33 (0) 169851212 ------------------------------
A couple months ago (sad but true), Fabrice passed on (https://mailman.ntg.nl/pipermail/ntg-pdftex/2016-January/004066.html) the report from Denis (https://groups.google.com/forum/#!topic/comp.text.pdf/ZnobgbiiDZ4) that pdftex creates pdfs with "No document catalog dictionary" and this fails jhove validation. But then, running the pdf through pdftk these.pdf output valid-these.pdf makes it pass. More details and a test document apparently at http://tex.stackexchange.com/questions/79947 Except that report implies that maybe this is about the EOL silliness that was fixed a couple years ago? So, Denis, can you confirm that pdftex from TL'15 still does not generate good pdfs for you? (I imagine you are using the latest pdftex, but just in case ...) Anyway, can someone compare the pdftex result with the pdftk result and discern the actual difference? (Sorry, I'd do it myself but I'm going to be away for a couple of days and time is getting short for TL'16.) Thanks, Karl
Karl Berry
A couple months ago (sad but true), Fabrice passed on (https://mailman.ntg.nl/pipermail/ntg-pdftex/2016-January/004066.html)
Karl, could you try phrasing stuff more carefully? URL:http://permalink.gmane.org/gmane.emacs.devel/201972 shows Fabrice alive and kicking just yesterday.
the report from Denis (https://groups.google.com/forum/#!topic/comp.text.pdf/ZnobgbiiDZ4) that pdftex creates pdfs with "No document catalog dictionary" and this fails jhove validation.
-- David Kastrup
Hi Karl, On Mon, Mar 21, 2016 at 11:51:08PM +0000, Karl Berry wrote:
A couple months ago (sad but true), Fabrice passed on (https://mailman.ntg.nl/pipermail/ntg-pdftex/2016-January/004066.html) the report from Denis (https://groups.google.com/forum/#!topic/comp.text.pdf/ZnobgbiiDZ4) that pdftex creates pdfs with "No document catalog dictionary" and this fails jhove validation.
But the consensus seemed to be that it was a JHOVE bug, see for example Patrick's message from 7 February to the dev-luatex list: https://mailman.ntg.nl/pipermail/dev-luatex/2016-February/005603.html
Anyway, can someone compare the pdftex result with the pdftk result and discern the actual difference? (Sorry, I'd do it myself but I'm going to be away for a couple of days and time is getting short for TL'16.)
The actual difference was pretty clear apparently: the PDF string representing the banner contains a pair of balanced parentheses, such as /PTEX.Fullbanner (This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015/W32TeX) kpathsea version 6.2.1) These are allowed by the PDF spec but JHOVE bails on them. Best, Arthur
On 22/03/16 00:31, Arthur Reutenauer wrote:
The actual difference was pretty clear apparently: the PDF string representing the banner contains a pair of balanced parentheses, such as /PTEX.Fullbanner (This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015/W32TeX) kpathsea version 6.2.1) These are allowed by the PDF spec but JHOVE bails on them.
Is there some pdftex option to suppress this PTEX.Fullbanner string in the document catalog entirely, along with the Producer key in the info dictionary, or at least to make sure that such strings do not leak any version numbers or other time-variable information into the PDF? This would be most useful for reproducible/deterministic-build applications, where you do not want the binary output of your compiler to change merely because of some embedded time stamp or version string. This is becoming important in some security applications, such as independently auditable binary distributions of open-source software. https://reproducible-builds.org/ (It would also be useful if I simply did not want to advertise what exact software revision I used to produce a PDF. Not to mention as a workaround regarding the JHOVE bug mentioned above ...) Markus -- Markus Kuhn, Computer Laboratory, University of Cambridge http://www.cl.cam.ac.uk/~mgk25/ || CB3 0FD, Great Britain
Dear Markus,
Is there some pdftex option to suppress this PTEX.Fullbanner string in the document catalog entirely, along with the Producer key in the info dictionary, or at least to make sure that such strings do not leak any version numbers or other time-variable information into the PDF?
In TeX Live 2016, some changes are made: (1) If an environment variable SOURCE_DATE_EPOCH is set like SOURCE_DATE_EPOCH=1456304783, it is used as the time and pdf may become unique. (2) New primitives \pdfinfoomitdate and \pdftrailerid. If \pdfinfoomitdate=1, date is omitted. If \pdftrailerid{somestring}, somestring is used as the trailer. It can be \pdftrailerid{}. These may also be used to obtain unique pdf. (3) A new primitive \pdfsuppressptexinfo. The default is \pdfsuppressptexinfo=0. [1] if \pdfsuppressptexinfo & 1 != 0 there is not a line /PTEX.Fullbanner (This is pdfTeX, ... ...) in an output pdf. [2] if \pdfsuppressptexinfo & 2 != 0 there is not a line /PTEX.FileName (... ...) in an output pdf. [3] if \pdfsuppressptexinfo & 4 != 0 there is not a line /PTEX.PageNumber ... in an output pdf. [4] if \pdfsuppressptexinfo & 8 != 0 there is not a line /PTEX.InfoDict ... ... in an output pdf. Best, Akira
Le 22/03/16 à 00h51, Karl Berry
A couple months ago (sad but true), Fabrice passed on (https://mailman.ntg.nl/pipermail/ntg-pdftex/2016-January/004066.html) the report from Denis (https://groups.google.com/forum/#!topic/comp.text.pdf/ZnobgbiiDZ4) that pdftex creates pdfs with "No document catalog dictionary" and this fails jhove validation.
But then, running the pdf through pdftk these.pdf output valid-these.pdf makes it pass.
More details and a test document apparently at http://tex.stackexchange.com/questions/79947 Except that report implies that maybe this is about the EOL silliness that was fixed a couple years ago?
So, Denis, can you confirm that pdftex from TL'15 still does not generate good pdfs for you? (I imagine you are using the latest pdftex, but just in case ...)
As you've seen, the trouble seems to come more from JHOVE than LaTeX but: - nevertheless, it turns out that all the files I tested with the current TL 2015 which are considered as invalid from JHOVE point of view become valid as soon as `\pdfobjcompresslevel 0' is added before `\documentclass', - I'm interested in testing the new primitives described by Akira. So I ran: - `rsync -a --delete --exclude=.svn tug.org::tldevsrc ~/texlive-svn/' - `~/texlive-svn/Master/bin/i386-linux/pdflatex file' for a file containing `\pdfsuppressptexinfo 0' before `\documentclass' but this failed with: ┌──── │ ! Undefined control sequence. │ l.2 \pdfsuppressptexinfo │ 0 └──── What am I doing wrong?
Thanks,
You're welcome. -- Denis
- I'm interested in testing the new primitives described by Akira. So I ran: - `rsync -a --delete --exclude=.svn tug.org::tldevsrc ~/texlive-svn/' - `~/texlive-svn/Master/bin/i386-linux/pdflatex file' for a file containing `\pdfsuppressptexinfo 0' before `\documentclass' but this failed with: ┌──── │ ! Undefined control sequence. │ l.2 \pdfsuppressptexinfo │ 0 └────
If all you've done is fetch the new binary, you still have an old format; you need to rebuild it. Best, Arthur
Le 22/03/16 à 16h07, Arthur Reutenauer
- I'm interested in testing the new primitives described by Akira. So I ran: - `rsync -a --delete --exclude=.svn tug.org::tldevsrc ~/texlive-svn/' - `~/texlive-svn/Master/bin/i386-linux/pdflatex file' for a file containing `\pdfsuppressptexinfo 0' before `\documentclass' but this failed with: ┌──── │ ! Undefined control sequence. │ l.2 \pdfsuppressptexinfo │ 0 └────
If all you've done is fetch the new binary, you still have an old format; you need to rebuild it.
Sigh... I'm not familiar to such process and would like to not break my current "normal" TL 2015. Could you let me know how to proceed? Best, -- Denis
Denis - to get a new pdftex binary right now, you'd have to compile from source. If you don't want to do that, just wait a couple of weeks for the TL pretest to start. Then you can make an installation (separate from your regular installations) that way. -k
Le 22/03/16 à 17h32, Karl Berry
Denis - to get a new pdftex binary right now, you'd have to compile from source. If you don't want to do that, just wait a couple of weeks for the TL pretest to start. Then you can make an installation (separate from your regular installations) that way.
Okay: I'll wait for the TL pretest. -- Denis
Le 22/03/16 à 15h54, Denis Bitouzé a écrit :
Le 22/03/16 à 00h51, Karl Berry
a écrit : A couple months ago (sad but true), Fabrice passed on (https://mailman.ntg.nl/pipermail/ntg-pdftex/2016-January/004066.html) the report from Denis (https://groups.google.com/forum/#!topic/comp.text.pdf/ZnobgbiiDZ4) that pdftex creates pdfs with "No document catalog dictionary" and this fails jhove validation.
But then, running the pdf through pdftk these.pdf output valid-these.pdf makes it pass.
More details and a test document apparently at http://tex.stackexchange.com/questions/79947 Except that report implies that maybe this is about the EOL silliness that was fixed a couple years ago?
So, Denis, can you confirm that pdftex from TL'15 still does not generate good pdfs for you? (I imagine you are using the latest pdftex, but just in case ...)
As you've seen, the trouble seems to come more from JHOVE than LaTeX but:
- nevertheless, it turns out that all the files I tested with the current TL 2015 which are considered as invalid from JHOVE point of view become valid as soon as `\pdfobjcompresslevel 0' is added before `\documentclass', - I'm interested in testing the new primitives described by Akira. So I ran: - `rsync -a --delete --exclude=.svn tug.org::tldevsrc ~/texlive-svn/' - `~/texlive-svn/Master/bin/i386-linux/pdflatex file' for a file containing `\pdfsuppressptexinfo 0' before `\documentclass' but this failed with: ┌──── │ ! Undefined control sequence. │ l.2 \pdfsuppressptexinfo │ 0 └──── What am I doing wrong?
Sorry for the delay. AFAICS with TL 2016, for passing successfully the JHOVE's validation test: - `\pdfobjcompresslevel 0' is necessary and sufficient, - `\pdfsuppressptexinfo 0' is harmless but hasn't any effect, which means that if it is inserted at the very beginning of the `.tex' file: - without `\pdfobjcompresslevel 0', the test fails, - with `\pdfobjcompresslevel 0', the test doesn't fail. All the best. -- Denis
participants (7)
-
Akira Kakuto
-
Arthur Reutenauer
-
David Kastrup
-
Denis Bitouzé
-
Fabrice Popineau
-
Karl Berry
-
Markus Kuhn