[NTG-pdftex] Re: Optimizing the generated pdf

Heiko Oberdiek oberdiek at uni-freiburg.de
Fri Nov 18 23:08:15 CET 2005


On Thu, Nov 17, 2005 at 11:39:06AM +0100, Hans Hagen wrote:

> Martin ??? wrote:
> 
> >On 2005-11-17 09:31:19 +0100, Taco Hoekwater wrote:
> > 
> >
> >>This is mostly the required page objects. 3 objects are used per actual
> >>(totally empty) page:
> >>
> >> 13 0 obj << /Length 0 >> stream endstream endobj
> >>   
> >>
> >
> >This one is even longer with compresslevel > 0. It would be nice
> >to not compress empty streams, but I think that would be too
> >difficult to implement and isn't needed very often.

I think this is the task of a separate optimizer that tries
different compression methods and chooses the best one.

> couldn't that be a null object then ?
> 
> >> 12 0 obj << /Type /Page /Contents 13 0 R /Resources 11 0 R
> >>             /MediaBox [0 0 595.2756 841.8898] /Parent 7 0 R >> endobj
> >> 11 0 obj << /ProcSet [ /PDF ] >> endobj
> >>   
> >>
> >
> >And 11 and 13 are created for every page. :-(
> >
> >11 could simply be empty (or null) for empty pages. Looking at
> ><Write out page object@>, it doesn't seem too difficult to
> >optimize for empty Resources. Of course, the question is: How
> >often do we have empty /Resources? Normally they at least have a
> >/Font entry.
> >
> >If we start optimizations like these, it would be nice to move
> >the /MediaBox to the root object (or pages) and write it only for
> >different-sized pages (Hans: ConTeXt writes /TrimBox and /CropBox
> >on every page (even if they are allways the same); adding them to
> >pdfpagesattr instead would save quite some space -- look at
> >pdftex-a.pdf).
> >
> > 
> >
> context can have mixed page sized in one document (which i need -)

Does not a standard exist that forbids inherited properties and
requires the setting of /MediaBox in each page object?

> >>It is a bit wasteful to keep those in the indirects objects table
> >>for ever and onwards, but I am not sure if it is doable to flush
> >>them right away. (CC ntg-pdftex)
> >>   
> >>
> >
> >I don't think that optimizations like these are generally usefull
> >as they are seldom needed and make the code more complex. When I
> >look at a typical result of ConTeXt or hyperref, they seem
> >unneeded.
> > 
> >
> indeed; we can have a 'nice to-do' list for that; however, i think that 
> the procset can safely be removed (not used by viewers anyway)
> 
> >Btw: Is there a tool that compresses a pdf by replacing identical
> >objects with references?
> > 
> >
> acrobat professional?
> 
> in pdftex it would mean calculating a checksum for each object before 
> flushing it; it may slow down things a bit

And it is a partial optimization only. Example: Same images with
a color table as separate object. In the first pass, the
identical color table objects are detected and replaced by one
object. Then the images itself contain the same reference to the
color table object and become identical and optimized in the
second run.
But also identical cyclic structures are possible ...

I think, the job of pdfTeX is to generate PDF. Optimization is
the job of another program, before there are too many things
to consider:
New compression features of newer PDF versions: compressed
cross-ref table, compressed object streams, filters before
applying compression, ...

Yours sincerely
  Heiko <oberdiek at uni-freiburg.de>
-- 


More information about the ntg-pdftex mailing list