Hi folks, There is a recurring problem with PDF files produced by either pdftex or luatex in France. Whenever PhD thesis documents are submitted to the French national archives, they have to be validated by some automated process which relies on pdftk (apparently). And files compiled by pdftex/luatex are rejected. The problem is described here : https://groups.google.com/forum/#!topic/comp.text.pdf/ZnobgbiiDZ4 French pdftex/luatex users will have an eternal gratitude to whoever will make this work :-) Best regards, -- Fabrice
On 1/22/2016 10:24 AM, Fabrice Popineau wrote:
Hi folks,
There is a recurring problem with PDF files produced by either pdftex or luatex in France. Whenever PhD thesis documents are submitted to the French national archives, they have to be validated by some automated process which relies on pdftk (apparently). And files compiled by pdftex/luatex are rejected.
The problem is described here : https://groups.google.com/forum/#!topic/comp.text.pdf/ZnobgbiiDZ4
French pdftex/luatex users will have an eternal gratitude to whoever will make this work :-)
As far as I know luatex (and pdftex) produce valid pdf files. Of course one can add all kind of crap to a pdf file liek invalid objects and so but that is not our responsibility. Now, to the file: https://github.com/dbitouze/yathesis/blob/master/doc/latex/yathesis/master-s... Acrobat preflight says that there should be an indirect object in an annotation (normally I'd expect a viewer not to bark on it and acrobat itself handles it ok). Those annotations are probably made by some macro package and one should check that code. But ... you mention "pdftex or luatex" but this file is not produced by any of those engines: it's made by xdvipdfmx which is something xetex. So we're off the hook. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On 1/22/2016 1:59 PM, Hans Hagen wrote:
On 1/22/2016 10:24 AM, Fabrice Popineau wrote:
Hi folks,
There is a recurring problem with PDF files produced by either pdftex or luatex in France. Whenever PhD thesis documents are submitted to the French national archives, they have to be validated by some automated process which relies on pdftk (apparently). And files compiled by pdftex/luatex are rejected.
The problem is described here : https://groups.google.com/forum/#!topic/comp.text.pdf/ZnobgbiiDZ4
French pdftex/luatex users will have an eternal gratitude to whoever will make this work :-)
As far as I know luatex (and pdftex) produce valid pdf files. Of course one can add all kind of crap to a pdf file liek invalid objects and so but that is not our responsibility.
Now, to the file:
https://github.com/dbitouze/yathesis/blob/master/doc/latex/yathesis/master-s...
Acrobat preflight says that there should be an indirect object in an annotation (normally I'd expect a viewer not to bark on it and acrobat itself handles it ok). Those annotations are probably made by some macro package and one should check that code.
But ... you mention "pdftex or luatex" but this file is not produced by any of those engines: it's made by xdvipdfmx which is something xetex.
So we're off the hook.
also, the file seems to be a linearized file so some postprocessing has been applied ... that can as well have messed up the file (or the checker cannot read linearized) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Fri, Jan 22, 2016 at 2:46 PM, Hans Hagen
On 1/22/2016 1:59 PM, Hans Hagen wrote:
On 1/22/2016 10:24 AM, Fabrice Popineau wrote:
Hi folks,
There is a recurring problem with PDF files produced by either pdftex or luatex in France. Whenever PhD thesis documents are submitted to the French national archives, they have to be validated by some automated process which relies on pdftk (apparently). And files compiled by pdftex/luatex are rejected.
The problem is described here : https://groups.google.com/forum/#!topic/comp.text.pdf/ZnobgbiiDZ4
French pdftex/luatex users will have an eternal gratitude to whoever will make this work :-)
As far as I know luatex (and pdftex) produce valid pdf files. Of course one can add all kind of crap to a pdf file liek invalid objects and so but that is not our responsibility.
Now, to the file:
https://github.com/dbitouze/yathesis/blob/master/doc/latex/yathesis/master-s...
Acrobat preflight says that there should be an indirect object in an annotation (normally I'd expect a viewer not to bark on it and acrobat itself handles it ok). Those annotations are probably made by some macro package and one should check that code.
But ... you mention "pdftex or luatex" but this file is not produced by any of those engines: it's made by xdvipdfmx which is something xetex.
So we're off the hook.
also, the file seems to be a linearized file so some postprocessing has been applied ... that can as well have messed up the file (or the checker cannot read linearized)
Hans
----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.com | www.pragma-pod.nl ----------------------------------------------------------------- _______________________________________________ dev-luatex mailing list dev-luatex@ntg.nl https://mailman.ntg.nl/mailman/listinfo/dev-luatex
As hans said Hans Hagen: >d:\qpdf\bin\qpdf.exe --check e:\tmp\yathesis.pdf checking e:\tmp\yathesis.pdf PDF Version: 1.5 File is not encrypted File is linearized ERROR: linearized file contains an uncompressed object after a compressed one in a cross-reference stream ERROR: page length mismatch for page 0: hint table = 29796; computed length = 25152 (offset = 4153) WARNING: object count mismatch for page 0: hint table = 12; computed = 14 WARNING: page 0 has shared identifier entries WARNING: page 0: shared object 3399: in hint table but not computed list so.. why File is linearized ? Does xetex linearized ? But we can check, at least. -- luigi
2016-01-22 13:59 GMT+01:00 Hans Hagen
On 1/22/2016 10:24 AM, Fabrice Popineau wrote:
Hi folks,
There is a recurring problem with PDF files produced by either pdftex or luatex in France. Whenever PhD thesis documents are submitted to the French national archives, they have to be validated by some automated process which relies on pdftk (apparently). And files compiled by pdftex/luatex are rejected.
The problem is described here : https://groups.google.com/forum/#!topic/comp.text.pdf/ZnobgbiiDZ4
French pdftex/luatex users will have an eternal gratitude to whoever will make this work :-)
As far as I know luatex (and pdftex) produce valid pdf files. Of course one can add all kind of crap to a pdf file liek invalid objects and so but that is not our responsibility.
Now, to the file:
https://github.com/dbitouze/yathesis/blob/master/doc/latex/yathesis/master-s...
Acrobat preflight says that there should be an indirect object in an annotation (normally I'd expect a viewer not to bark on it and acrobat itself handles it ok). Those annotations are probably made by some macro package and one should check that code.
But ... you mention "pdftex or luatex" but this file is not produced by any of those engines: it's made by xdvipdfmx which is something xetex.
So we're off the hook.
Oh Denis may have been wrong with his own file, but I had this problem and I can guarantee that that was with a file compiled with pdftex. I had to use Acrobat to 'fix' the pdf file and make it acceptable for this archive system, which is mandatory for all phd thesis in France. As this problem surfaced again on the GUTenberg mailing list, I decided to ask. Here is my attempts width the attached test.tex file : - without the use of the pdfx package, the validator returns Message : No document catalog dictionary - with the use of the pdfx package, the validator returns Message : Lexical error which is annoying and may indicate some kind of error (but is it low level in pdftex or rather in the pdfx package ?) - after using Acrobat and saving test.pdf as pdf/a-1b, the validator is accepting it. Fabrice
2016-01-22 13:59 GMT+01:00 Hans Hagen
On 1/22/2016 10:24 AM, Fabrice Popineau wrote:
Hi folks,
There is a recurring problem with PDF files produced by either pdftex or luatex in France. Whenever PhD thesis documents are submitted to the French national archives, they have to be validated by some automated process which relies on pdftk (apparently). And files compiled by pdftex/luatex are rejected.
The problem is described here : https://groups.google.com/forum/#!topic/comp.text.pdf/ZnobgbiiDZ4
French pdftex/luatex users will have an eternal gratitude to whoever will make this work :-)
As far as I know luatex (and pdftex) produce valid pdf files. Of course one can add all kind of crap to a pdf file liek invalid objects and so but that is not our responsibility.
Now, to the file:
https://github.com/dbitouze/yathesis/blob/master/doc/latex/yathesis/master-s...
Acrobat preflight says that there should be an indirect object in an annotation (normally I'd expect a viewer not to bark on it and acrobat itself handles it ok). Those annotations are probably made by some macro package and one should check that code.
But ... you mention "pdftex or luatex" but this file is not produced by any of those engines: it's made by xdvipdfmx which is something xetex.
So we're off the hook.
Oh Denis may have been wrong with his own file, but I had this problem and I can guarantee that that was with a file compiled with pdftex. I had to use Acrobat to 'fix' the pdf file and make it acceptable for this archive system, which is mandatory for all phd thesis in France. As this problem surfaced again on the GUTenberg mailing list, I decided to ask. Here is my attempts width the attached test.tex file : - without the use of the pdfx package, the validator returns Message : No document catalog dictionary - with the use of the pdfx package, the validator returns Message : Lexical error which is annoying and may indicate some kind of error (but is it low level in pdftex or rather in the pdfx package ?) - after using Acrobat and saving test.pdf as pdf/a-1b, the validator is accepting it. Fabrice
On Fri, Jan 22, 2016 at 3:34 PM, Fabrice Popineau < fabrice.popineau@supelec.fr> wrote:
Oh Denis may have been wrong with his own file, but I had this problem and I can guarantee that that was with a file compiled with pdftex. I had to use Acrobat to 'fix' the pdf file and make it acceptable for this archive system, which is mandatory for all phd thesis in France.
As this problem surfaced again on the GUTenberg mailing list, I decided to ask.
Here is my attempts width the attached test.tex file : - without the use of the pdfx package, the validator returns Message : No document catalog dictionary - with the use of the pdfx package, the validator returns Message : Lexical error which is annoying and may indicate some kind of error (but is it low level in pdftex or rather in the pdfx package ?) - after using Acrobat and saving test.pdf as pdf/a-1b, the validator is accepting it.
ok, this is a latex thing, so I think it's not a pdftex luatex problem. Well, I hope. I will investigate. -- luigi
On 1/22/2016 3:34 PM, Fabrice Popineau wrote:
As this problem surfaced again on the GUTenberg mailing list, I decided to ask.
Here is my attempts width the attached test.tex file : - without the use of the pdfx package, the validator returns Message : No document catalog dictionary - with the use of the pdfx package, the validator returns Message : Lexical error which is annoying and may indicate some kind of error (but is it low level in pdftex or rather in the pdfx package ?) - after using Acrobat and saving test.pdf as pdf/a-1b, the validator is accepting it.
well, no document catalog is pretty hard to achieve unless one really messes up so indeed it might be a macro package issue you can mail me the test file (use \pdfcompresslevel0) as i cannot process it here Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hi,
I'm definitely sorry about the crappy pdf file that was referenced by the
message I cited.
I didn't try this file in particular. I cited the message only because it
was describing the very
same problem I observed too.
2016-01-22 16:27 GMT+01:00 Hans Hagen
On 1/22/2016 3:34 PM, Fabrice Popineau wrote:
As this problem surfaced again on the GUTenberg mailing list, I decided to ask.
Here is my attempts width the attached test.tex file : - without the use of the pdfx package, the validator returns Message : No document catalog dictionary - with the use of the pdfx package, the validator returns Message : Lexical error which is annoying and may indicate some kind of error (but is it low level in pdftex or rather in the pdfx package ?) - after using Acrobat and saving test.pdf as pdf/a-1b, the validator is accepting it.
well, no document catalog is pretty hard to achieve unless one really messes up so indeed it might be a macro package issue
'No document catalog dictionary' is what you get with the simplest latex file by default.
you can mail me the test file (use \pdfcompresslevel0) as i cannot process it here
Done. Apologies if the problem lies somewhere at the macro level. But I was pretty sure to find the needed expertise here :-) Best regards, -- Fabrice
2016-01-22 10:24 GMT+01:00 Fabrice Popineau
There is a recurring problem with PDF files produced by either pdftex or luatex in France. Whenever PhD thesis documents are submitted to the French national archives, they have to be validated by some automated process which relies on pdftk (apparently). And files compiled by pdftex/luatex are rejected.
The validation service is at http://facile.cines.fr/ It seems they want PDF/A-1b, since the failure icon links to http://facile.cines.fr/aide/pdf.jsf But the table seems to allow normal PDF 1.5... Best Martin
On 1/26/2016 4:00 PM, Martin Schröder wrote:
2016-01-22 10:24 GMT+01:00 Fabrice Popineau
: There is a recurring problem with PDF files produced by either pdftex or luatex in France. Whenever PhD thesis documents are submitted to the French national archives, they have to be validated by some automated process which relies on pdftk (apparently). And files compiled by pdftex/luatex are rejected.
The validation service is at http://facile.cines.fr/
It seems they want PDF/A-1b, since the failure icon links to http://facile.cines.fr/aide/pdf.jsf But the table seems to allow normal PDF 1.5...
the files (fabrice sent some) are ok, the validator is the problem ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On 2016-01-26 at 16:00:48 +0100, Martin Schröder wrote:
2016-01-22 10:24 GMT+01:00 Fabrice Popineau
: There is a recurring problem with PDF files produced by either pdftex or luatex in France. Whenever PhD thesis documents are submitted to the French national archives, they have to be validated by some automated process which relies on pdftk (apparently). And files compiled by pdftex/luatex are rejected.
The validation service is at http://facile.cines.fr/
It seems they want PDF/A-1b, since the failure icon links to http://facile.cines.fr/aide/pdf.jsf But the table seems to allow normal PDF 1.5...
Thanks for the link. I uploaded a PDF/A-1b file created with LuaTeX and if a red cross means failure, it failed. This is strange because five other PDF/A-1b validators didn't complain. The xpdf package isn't involved in this case, I wrote PDF/A-1b support entirely in Lua. Only one validator (pdftron), complained about a TrueType font (CharisSIL) "CIDSet in subset font is incomplete" but I reverse-engineered the CIDSet entry and came to the conclusion that everything is compliant with the PDF/A-1b standard. After all, if a particular validator complains and the reason can't be determined, the most reasonable way to proceed is to try to contact the guy who wrote this validator. One can never be sure that validators are perfect. Fabrice, you said:
'No document catalog dictionary' is what you get with the simplest latex file by default.
If you compile with \pdfcompresslevel=0 you'll see that it exists. The question is why the validator can't find it. I suppose that its parser is more fussy than required by the standard. Regards, Reinhard -- ------------------------------------------------------------------ Reinhard Kotucha Phone: +49-511-3373112 Marschnerstr. 25 D-30167 Hannover mailto:reinhard.kotucha@web.de ------------------------------------------------------------------
Hi,
Some news about this problem and quite a surprising one.
Denis has found a workaround floating around (french latex newsgroup) and
it seems to work.
Basically, the problem is triggered by the banner which is stored in the
PDF:
/PTEX.Fullbanner (This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live
2015/W32TeX) kpathsea version 6.2.1)
Once you replace the inner parenthesis by bracket, the validator doesn't
complain anymore.
So, the recipe is:
- compile with
\pdfcompresslevel0
\pdfobjcompresslevel0
- use
\usepackage[a-1b]{pdfx}
if you want to
- use pdflatex as per usual
- replace the ( ) in the banner by [ ] :
sed -i '/^\/PTEX\.Fullbanner/{s/(/[/g;s/)/]/g;s/\[/(/;s/\]$/)/}' file.pdf
Shouldn't those parenthesis be escaped in some way? Or is it the validator
parser which is buggy?
Fabrice
2016-01-27 0:11 GMT+01:00 Reinhard Kotucha
On 2016-01-26 at 16:00:48 +0100, Martin Schröder wrote:
2016-01-22 10:24 GMT+01:00 Fabrice Popineau < fabrice.popineau@supelec.fr>:
There is a recurring problem with PDF files produced by either pdftex or luatex in France. Whenever PhD thesis documents are submitted to the French national archives, they have to be validated by some automated process which relies on pdftk (apparently). And files compiled by pdftex/luatex are rejected.
The validation service is at http://facile.cines.fr/
It seems they want PDF/A-1b, since the failure icon links to http://facile.cines.fr/aide/pdf.jsf But the table seems to allow normal PDF 1.5...
Thanks for the link. I uploaded a PDF/A-1b file created with LuaTeX and if a red cross means failure, it failed. This is strange because five other PDF/A-1b validators didn't complain. The xpdf package isn't involved in this case, I wrote PDF/A-1b support entirely in Lua.
Only one validator (pdftron), complained about a TrueType font (CharisSIL)
"CIDSet in subset font is incomplete"
but I reverse-engineered the CIDSet entry and came to the conclusion that everything is compliant with the PDF/A-1b standard.
After all, if a particular validator complains and the reason can't be determined, the most reasonable way to proceed is to try to contact the guy who wrote this validator. One can never be sure that validators are perfect.
Fabrice, you said:
'No document catalog dictionary' is what you get with the simplest latex file by default.
If you compile with \pdfcompresslevel=0 you'll see that it exists. The question is why the validator can't find it. I suppose that its parser is more fussy than required by the standard.
Regards, Reinhard
-- ------------------------------------------------------------------ Reinhard Kotucha Phone: +49-511-3373112 Marschnerstr. 25 D-30167 Hannover mailto:reinhard.kotucha@web.de ------------------------------------------------------------------ _______________________________________________ dev-luatex mailing list dev-luatex@ntg.nl https://mailman.ntg.nl/mailman/listinfo/dev-luatex
-- Fabrice Popineau ----------------------------- SUPELEC Département Informatique 3, rue Joliot Curie 91192 Gif/Yvette Cedex Tel direct : +33 (0) 169851950 Standard : +33 (0) 169851212 ------------------------------
Shouldn't those parenthesis be escaped in some way? Or is it the validator parser which is buggy?
The spec allows nested parenthesis: PDF Spec 1.7 3.2.3 Literal Strings A literal string is written as an arbitrary number of characters enclosed in parentheses. Any characters may appear in a string except unbalanced parentheses and the backslash, which must be treated specially. Balanced pairs of parentheses within a string require no special treatment. Example from the spec: (Strings may contain balanced parentheses ( ) and special characters (*!&}^% and so on).) So I bet the validator is buggy. Patrick
2016-02-07 17:17 GMT+01:00 Patrick Gundlach
Shouldn't those parenthesis be escaped in some way? Or is it the validator parser which is buggy?
The spec allows nested parenthesis:
PDF Spec 1.7 3.2.3 Literal Strings
A literal string is written as an arbitrary number of characters enclosed in parentheses. Any characters may appear in a string except unbalanced parentheses and the backslash, which must be treated specially. Balanced pairs of parentheses within a string require no special treatment.
Example from the spec:
(Strings may contain balanced parentheses ( ) and special characters (*!&}^% and so on).)
So I bet the validator is buggy.
Thanks Patrick, so yes, this is the validator (some version of JHOVE) which is buggy and the problem is closed. At least there is an easy workaround. -- Fabrice
participants (7)
-
Fabrice Popineau
-
Fabrice Popineau
-
Hans Hagen
-
luigi scarso
-
Martin Schröder
-
Patrick Gundlach
-
Reinhard Kotucha