On Thu, Nov 17, 2005 at 11:39:06AM +0100, Hans Hagen wrote:
Martin ??? wrote:
On 2005-11-17 09:31:19 +0100, Taco Hoekwater wrote:
This is mostly the required page objects. 3 objects are used per actual (totally empty) page:
13 0 obj << /Length 0 >> stream endstream endobj
This one is even longer with compresslevel > 0. It would be nice to not compress empty streams, but I think that would be too difficult to implement and isn't needed very often.
I think this is the task of a separate optimizer that tries different compression methods and chooses the best one.
couldn't that be a null object then ?
12 0 obj << /Type /Page /Contents 13 0 R /Resources 11 0 R /MediaBox [0 0 595.2756 841.8898] /Parent 7 0 R >> endobj 11 0 obj << /ProcSet [ /PDF ] >> endobj
And 11 and 13 are created for every page. :-(
11 could simply be empty (or null) for empty pages. Looking at
, it doesn't seem too difficult to optimize for empty Resources. Of course, the question is: How often do we have empty /Resources? Normally they at least have a /Font entry. If we start optimizations like these, it would be nice to move the /MediaBox to the root object (or pages) and write it only for different-sized pages (Hans: ConTeXt writes /TrimBox and /CropBox on every page (even if they are allways the same); adding them to pdfpagesattr instead would save quite some space -- look at pdftex-a.pdf).
context can have mixed page sized in one document (which i need -)
Does not a standard exist that forbids inherited properties and requires the setting of /MediaBox in each page object?
It is a bit wasteful to keep those in the indirects objects table for ever and onwards, but I am not sure if it is doable to flush them right away. (CC ntg-pdftex)
I don't think that optimizations like these are generally usefull as they are seldom needed and make the code more complex. When I look at a typical result of ConTeXt or hyperref, they seem unneeded.
indeed; we can have a 'nice to-do' list for that; however, i think that the procset can safely be removed (not used by viewers anyway)
Btw: Is there a tool that compresses a pdf by replacing identical objects with references?
acrobat professional?
in pdftex it would mean calculating a checksum for each object before flushing it; it may slow down things a bit
And it is a partial optimization only. Example: Same images with
a color table as separate object. In the first pass, the
identical color table objects are detected and replaced by one
object. Then the images itself contain the same reference to the
color table object and become identical and optimized in the
second run.
But also identical cyclic structures are possible ...
I think, the job of pdfTeX is to generate PDF. Optimization is
the job of another program, before there are too many things
to consider:
New compression features of newer PDF versions: compressed
cross-ref table, compressed object streams, filters before
applying compression, ...
Yours sincerely
Heiko