I received the following message from a publisher: [The PDF generator] you used flattens the file such that is has no meta tags whatsoever. We need the tags that give page separations, page count, header info, etc. How do I get those meta tags enabled? Would setting \interaction[state=start] do what's needed? Running minimals-beta. Thanks, Bart
On Mon January 19 2009 10:06:37 pm Bart C. Wise wrote:
I received the following message from a publisher:
[The PDF generator] you used flattens the file such that is has no meta tags whatsoever. We need the tags that give page separations, page count, header info, etc.
How do I get those meta tags enabled? Would setting \interaction[state=start] do what's needed?
Running minimals-beta. Running ConTeXt/LuaTeX.
Bart
On Tue, Jan 20, 2009 at 6:53 AM, Bart C. Wise
On Mon January 19 2009 10:06:37 pm Bart C. Wise wrote:
I received the following message from a publisher:
[The PDF generator] you used flattens the file such that is has no meta tags whatsoever. We need the tags that give page separations, page count, header info, etc.
How do I get those meta tags enabled? Would setting \interaction[state=start] do what's needed?
Running minimals-beta. Running ConTeXt/LuaTeX.
Hi bart, I can't help you but I remember this http://tug.ctan.org/tex-archive/macros/latex/contrib/pdfx -- luigi
2009/1/20 Bart C. Wise
[The PDF generator] you used flattens the file such that is has no meta tags whatsoever. We need the tags that give page separations, page count, header info, etc.
I call that BS: I've never heard of such "meta tags" for PDF giving the page count. The page count is in the root /Pages object per PDF specification. This "meta tags" BS sounds suspiciously like PostScript DSC. Best Martin
I call that BS: I've never heard of such "meta tags" for PDF giving the page count. The page count is in the root /Pages object per PDF specification.
I still suspect Bart's publisher means the tags that are part of Tagged PDF, even if he doesn't know their actual name, and has clearly a wrong idea of what they really contain. The bad news is, TeX's support for Tagged PDF is very weak. There have been some experiments by Han The Thanh recently, but no general progress that I am aware of. In any case, generating fully tagged PDF would need a lot of collaboration between the engine and the macro package.
This "meta tags" BS sounds suspiciously like PostScript DSC.
A bit, but the guy also asks for "header information", which sounds more like the kind of information you put between PDF's BDC / EMC operators. Arthur
On Tue, Jan 20, 2009 at 12:10 PM, Arthur Reutenauer < arthur.reutenauer@normalesup.org> wrote:
I call that BS: I've never heard of such "meta tags" for PDF giving the page count. The page count is in the root /Pages object per PDF specification.
I still suspect Bart's publisher means the tags that are part of Tagged PDF, even if he doesn't know their actual name, and has clearly a wrong idea of what they really contain.
to avoid confusione (at least to me): xmp and Tagged pdf are different things. -- luigi
On Tue January 20 2009 4:33:27 am luigi scarso wrote:
On Tue, Jan 20, 2009 at 12:10 PM, Arthur Reutenauer <
arthur.reutenauer@normalesup.org> wrote:
I call that BS: I've never heard of such "meta tags" for PDF giving the page count. The page count is in the root /Pages object per PDF specification.
I still suspect Bart's publisher means the tags that are part of Tagged PDF, even if he doesn't know their actual name, and has clearly a wrong idea of what they really contain.
to avoid confusione (at least to me): xmp and Tagged pdf are different things.
Thanks to all for the information so far, although from all indications, it doesn't look promising. I talked to the publisher again and he said that he would send me the exact error message, but I have not received it yet. But he did say that his printing shop wants the ability to download just the header information from a pdf rather than the whole pdf file which may be up to 80 mbytes. From that header information they will have the ability to render individual pages rather than the whole document. For example, they could request page 264 and render that single page as a jpeg. Note that this is information from the publisher, not the printing shop that is doing the work, so technically, the publisher's jargon may be inaccurate, but a basic understanding of the needed functionality is there. When I get specific information from the printing shop, I'll pass it along. As a side note, I have published with them in the past, and this seems to be a recent change on their part. So I may be able to talk my way into letting them take the pdf file without the tagged information. But needless to say, I'm very concerned. If tagged pdf support is not available in ConTeXt/LuaTeX, I feel that difficulties are either here now, or at best, looming on the horizon. Thanks so much, Bart
I talked to the publisher again and he said that he would send me the exact error message, but I have not received it yet. But he did say that his printing shop wants the ability to download just the header information from a pdf rather than the whole pdf file which may be up to 80 mbytes. From that header information they will have the ability to render individual pages rather than the whole document. For example, they could request page 264 and render that single page as a jpeg.
Note that this is information from the publisher, not the printing shop that is doing the work, so technically, the publisher's jargon may be inaccurate, but a basic understanding of the needed functionality is there.
When I get specific information from the printing shop, I'll pass it along.
ok , I'm really interested about it . -- luigi
2009/1/20 Bart C. Wise
I talked to the publisher again and he said that he would send me the exact error message, but I have not received it yet. But he did say that his printing shop wants the ability to download just the header information from a pdf rather than the whole pdf file which may be up to 80 mbytes. From that header information they will have the ability to render individual pages rather than the whole document. For example, they could request page 264 and render that single page as a jpeg.
Sounds like they are talking about linearized PDF. pdfopt from GhostScript can generate that; and of course Acrobat. Best Martin
2009/1/20 Bart C. Wise
I talked to the publisher again and he said that he would send me the exact error message, but I have not received it yet. But he did say that his printing shop wants the ability to download just the header information from a pdf rather than the whole pdf file which may be up to 80 mbytes. From that header information they will have the ability to render individual pages rather than the whole document. For example, they could request page 264 and render that single page as a jpeg.
This sounds like "web optimized" PDFs, those contain a second object index at the start ("normal" PDFs have it at the end), so a *browser* can request selected pages from the *webserver* without loading the whole document. It's meant as a web technology, and I never heard of anyone using it in a print workflow. But it's not impossible. Tha has *nothing* to do with tagged PDF! "Tagged" is a technology to enable re-flowing text contents to e.g. small devices or extracting of content for alternative readers, e.g. screenreaders. PDFs for print should *not* be tagged in this way, for it can confuse a print workflow. Printshops should adhere to printing standards like PDF/X-1a and PDF/X-3 - and these never need web optimization or tagging! Greetlings, Hraban (printing engineer and PDF workflow techie)
On Tue January 20 2009 7:06:58 am Henning Hraban Ramm wrote:
2009/1/20 Bart C. Wise
: I talked to the publisher again and he said that he would send me the exact error message, but I have not received it yet. But he did say that his printing shop wants the ability to download just the header information from a pdf rather than the whole pdf file which may be up to 80 mbytes. From that header information they will have the ability to render individual pages rather than the whole document. For example, they could request page 264 and render that single page as a jpeg.
This sounds like "web optimized" PDFs, those contain a second object index at the start ("normal" PDFs have it at the end), so a *browser* can request selected pages from the *webserver* without loading the whole document.
It's meant as a web technology, and I never heard of anyone using it in a print workflow. But it's not impossible.
Tha has *nothing* to do with tagged PDF! "Tagged" is a technology to enable re-flowing text contents to e.g. small devices or extracting of content for alternative readers, e.g. screenreaders. PDFs for print should *not* be tagged in this way, for it can confuse a print workflow.
Printshops should adhere to printing standards like PDF/X-1a and PDF/X-3 - and these never need web optimization or tagging!
Excuse the ignorance, but does LuaTeX produce a PDF based on the PDF/X-1a and PDF/X-3 standards? Bart
On Tue, Jan 20, 2009 at 3:30 PM, Bart C. Wise
2009/1/20 Bart C. Wise
: I talked to the publisher again and he said that he would send me the exact error message, but I have not received it yet. But he did say
On Tue January 20 2009 7:06:58 am Henning Hraban Ramm wrote: that
his printing shop wants the ability to download just the header information from a pdf rather than the whole pdf file which may be up to 80 mbytes. From that header information they will have the ability to render individual pages rather than the whole document. For example, they could request page 264 and render that single page as a jpeg.
This sounds like "web optimized" PDFs, those contain a second object index at the start ("normal" PDFs have it at the end), so a *browser* can request selected pages from the *webserver* without loading the whole document.
It's meant as a web technology, and I never heard of anyone using it in a print workflow. But it's not impossible.
Tha has *nothing* to do with tagged PDF! "Tagged" is a technology to enable re-flowing text contents to e.g. small devices or extracting of content for alternative readers, e.g. screenreaders. PDFs for print should *not* be tagged in this way, for it can confuse a print workflow.
Printshops should adhere to printing standards like PDF/X-1a and PDF/X-3 - and these never need web optimization or tagging!
Excuse the ignorance, but does LuaTeX produce a PDF based on the PDF/X-1a and PDF/X-3 standards?
For ghostscript http://pages.cs.wisc.edu/~ghost/doc/svn/Ps2pdf.htm for pdftex http://tug.ctan.org/tex-archive/macros/latex/contrib/pdfx I'm not sure if luatex can produce these kind of pdf -- luigi
2009/1/20 Bart C. Wise
Printshops should adhere to printing standards like PDF/X-1a and PDF/X-3 - and these never need web optimization or tagging! Excuse the ignorance, but does LuaTeX produce a PDF based on the PDF/X-1a and PDF/X-3 standards?
Not by itself. Yo have to make sure on your own that your embedded PDFs are ok. TeX doesn't set any PDF/X "markers" - for it doesn't check embedded material that would no good idea anyway. But you can easily prepare PDFs within the borders of PDF/X-1a (e.g. only CMYK data, no animations) using TeX. PDF/X-3 is also possible (I guess), but harder (you need the right color profiles everywhere). In my experience too much printshops can't handle PDF/X-3 anyway... Greetlings, Hraban
But he did say that his printing shop wants the ability to download just the header information from a pdf rather than the whole pdf file which may be up to 80 mbytes.
OK. That's not Tagged PDF. Tagged PDF's main features focus on accessibility, adding information for the visually impaired (you can, for example, tag some text as part of the page header, by contrast to the page body: an application that reads the document out loud would know not to read that part). It also allows better archiving (the PDF/A standard). All concerns very distinct from the needs of publishers. I'm just learning about XMP (Extensible Metadata Platform) which Luigi mentioned, but it doesn't really look like it contains the information you mention (although you can apparently add all sort of metadata, including images). Actually, the kind of information the printing shop asks for is available in any PDF file in a straightforward way: the very format has been designed so that all the PDF objects can be accessed directly with extreme efficience (there is a cross-reference table with the byte offsets to every object inside the file). Individual pages are objects in a PDF file; they contain references to the resources needed to render them (fonts, images, etc.), so the basic functionality to render each page individually is already present in the format. And it's been there from day one -- which is, by the way, the reason why the insides of a PDF file look so undecipherable to the human eye: it's designed to be efficient to process automatically, not to be read by a programmer. By contrast, an XML-based format would be (somewhat) more human-friendly, but much slower to parse. There's a variation on this basic feature: if you look at a PDF file over the Internet, the cross-reference table isn't conveniently located because it is at the very end of the file; so you need to download the entire file before your PDF viewer can start displaying it (I think the argument behind that design decision was that a PDF-producing application only knows the entire list of objects at the end of the first pass, and can thus output the whole file sequentially in a single pass. Of course that clashes directly with the needs of PDF-consuming applications). To circumvent this, Adobe devised a special type of object that contains the same information as the cross-reference table, which you can put at the very beginning of the file, together with the material needed to render the first pages. This is Linearized PDF (sometimes, confusingly enough, called "optimized" PDF). It's rather unlikely that it'd be what your printer wants (I suppose the file is already available on disk somewhere), but in any case, Ghostscript can produce it with the utility pdfopt. ConTeXt isn't able to produce it; it has been ruled that it was beyond the scope of pdfTeX and luaTeX.
When I get specific information from the printing shop, I'll pass it along.
I'm interested, too.
But needless to say, I'm very concerned. If tagged pdf support is not available in ConTeXt/LuaTeX, I feel that difficulties are either here now, or at best, looming on the horizon.
Why? There's progress made every day. Tagged PDF is indeed a problem for the moment, but it's clearly not the feature your printer asks for, and as a rule, you can be sure that if some functionality is essential to publishers, it will be added quickly to ConTeXt :-) Arthur
I'm just learning about XMP (Extensible Metadata Platform) which Luigi mentioned, but it doesn't really look like it contains the information you mention (although you can apparently add all sort of metadata, including images).
yes and XMP is not only for pdf : http://labs.adobe.com/technologies/xmplibrary/ -- luigi
On Tuesday 20 January 2009 08:19:04 am Arthur Reutenauer wrote:
But he did say that his printing shop wants the ability to download just the header information from a pdf rather than the whole pdf file which may be up to 80 mbytes.
OK. That's not Tagged PDF. Tagged PDF's main features focus on accessibility, adding information for the visually impaired (you can, for example, tag some text as part of the page header, by contrast to the page body: an application that reads the document out loud would know not to read that part). It also allows better archiving (the PDF/A standard). All concerns very distinct from the needs of publishers.
I'm just learning about XMP (Extensible Metadata Platform) which Luigi mentioned, but it doesn't really look like it contains the information you mention (although you can apparently add all sort of metadata, including images).
Actually, the kind of information the printing shop asks for is available in any PDF file in a straightforward way: the very format has been designed so that all the PDF objects can be accessed directly with extreme efficience (there is a cross-reference table with the byte offsets to every object inside the file). Individual pages are objects in a PDF file; they contain references to the resources needed to render them (fonts, images, etc.), so the basic functionality to render each page individually is already present in the format. And it's been there from day one -- which is, by the way, the reason why the insides of a PDF file look so undecipherable to the human eye: it's designed to be efficient to process automatically, not to be read by a programmer. By contrast, an XML-based format would be (somewhat) more human-friendly, but much slower to parse.
There's a variation on this basic feature: if you look at a PDF file over the Internet, the cross-reference table isn't conveniently located because it is at the very end of the file; so you need to download the entire file before your PDF viewer can start displaying it (I think the argument behind that design decision was that a PDF-producing application only knows the entire list of objects at the end of the first pass, and can thus output the whole file sequentially in a single pass. Of course that clashes directly with the needs of PDF-consuming applications). To circumvent this, Adobe devised a special type of object that contains the same information as the cross-reference table, which you can put at the very beginning of the file, together with the material needed to render the first pages. This is Linearized PDF (sometimes, confusingly enough, called "optimized" PDF). It's rather unlikely that it'd be what your printer wants (I suppose the file is already available on disk somewhere), but in any case, Ghostscript can produce it with the utility pdfopt. ConTeXt isn't able to produce it; it has been ruled that it was beyond the scope of pdfTeX and luaTeX.
When I get specific information from the printing shop, I'll pass it along.
I'm interested, too.
But needless to say, I'm very concerned. If tagged pdf support is not available in ConTeXt/LuaTeX, I feel that difficulties are either here now, or at best, looming on the horizon.
Why? There's progress made every day. Tagged PDF is indeed a problem for the moment, but it's clearly not the feature your printer asks for, and as a rule, you can be sure that if some functionality is essential to publishers, it will be added quickly to ConTeXt :-)
Thanks again to all for the responses. The information has been very enlightening. I have sent an optimized (I know, badly named,) PDF file off to the publisher and I'm waiting for his response. From all indications on this thread, I'm somewhat optimistic that it will solve the problem. I'll let you know what I hear back. Thanks so much again, Bart
Am 20.01.2009 um 16:19 schrieb Arthur Reutenauer:
Tagged PDF is indeed a problem for the moment, but it's clearly not the feature your printer asks for, and as a rule, you can be sure that if some functionality is essential to publishers, it will be added quickly to ConTeXt :-)
Hi Hans, these wise words were written 1 1/2 years ago ... and publishers indeed more and more often ask for that! For me, the only way to do tagging is with AcrobatPro. But as LuaTeX made so much progress in the last 18 month, maybe is there also some break-through on this topic? Steffen
participants (6)
-
Arthur Reutenauer
-
Bart C. Wise
-
Henning Hraban Ramm
-
luigi scarso
-
Martin Schröder
-
Steffen Wolfrum