[NTG-pdftex] Re: Optimizing the generated pdf

Hartmut Henkel hartmut_henkel at gmx.de
Thu Nov 17 18:46:58 CET 2005

On Thu, 17 Nov 2005, Hans Hagen wrote:

> Martin  wrote:
> > On 2005-11-17 10:46:51 +0100, Martin Schrder wrote:
> >
> > > Btw: Is there a tool that compresses a pdf by replacing identical
> > > objects with references?
> >
> > pdfTeX could do this by itself: Store the md5 of the shortest n
> > objects (e.g. n = 1024) smaller then x bytes (e.g. x = 1024, longer
> > objects will typically be unique) and replace new identical objects
> > with references to the already existing ones.

i won't want to rely on md5 alone (shit happens). Finally one needs a
literal comparison. And when the object is gone, it's nasty to seek
around in the PDF file.

> > This would e.g. condense all the obj <</S /GoTo /D [n 0 R /Fit]>>
> > endobj in the pdfTeX manual. :-)

if it's enough to scan the last say 100 non-stream objects: this can be
done, at least it would catch these next to each other similar objects.
Maybe MD5 would be overkill, just a hash + comparison would be ok. As it
happens, these non-streams are collected in a separate buffer here
before being written out. Let's see...

> such a feature makes sense indeed; maybe even configurable:
> \pdfshareobjsize=1024 % with 0 meaning no checking done

would fit in this case.

Regards, Hartmut

