[NTG-pdftex] Re: Optimizing the generated pdf

Heiko Oberdiek oberdiek at uni-freiburg.de
Fri Nov 18 23:16:31 CET 2005


On Thu, Nov 17, 2005 at 06:46:58PM +0100, Hartmut Henkel wrote:

> On Thu, 17 Nov 2005, Hans Hagen wrote:
> 
> > Martin  wrote:
> >
> > > On 2005-11-17 10:46:51 +0100, Martin Schrder wrote:
> > >
> > > > Btw: Is there a tool that compresses a pdf by replacing identical
> > > > objects with references?
> > >
> > > pdfTeX could do this by itself: Store the md5 of the shortest n
> > > objects (e.g. n = 1024) smaller then x bytes (e.g. x = 1024, longer
> > > objects will typically be unique) and replace new identical objects
> > > with references to the already existing ones.
> 
> i won't want to rely on md5 alone (shit happens). Finally one needs a
> literal comparison.

I agree.

> And when the object is gone, it's nasty to seek
> around in the PDF file.

The position and length of the objects could be stored in memory.

> > > This would e.g. condense all the obj <</S /GoTo /D [n 0 R /Fit]>>
> > > endobj in the pdfTeX manual. :-)
> 
> if it's enough to scan the last say 100 non-stream objects: this can be
> done, at least it would catch these next to each other similar objects.

The matches of "similar" objects can be increased by normalization:
* Removal of unnecessary spaces.
* Ordering of dictionary keys.
* Normalization of strings and names.

Disadvantage: parsing of pdf objects would be necessary.

> Maybe MD5 would be overkill, just a hash + comparison would be ok.

Yes.

Yours sincerely
  Heiko <oberdiek at uni-freiburg.de>
-- 


More information about the ntg-pdftex mailing list