[NTG-pdftex] PDF inclusion with annotations

Andreas Matthias amat@kabsi.at
23 May 2003 01:06:17 +0200

Last month on pdftex mailing list we had a short discussion, whether
annotations of external PDFs could be copied when including PDFs.
I gave it a try and now I have a first partial success. The code
is far from being complete and there are still a lot of bugs in it.
Nevertheless I think it is the right time now to show you the code,
so we can discuss whether this is the right way to go on.

You can download the patches at:


Here is a short description of the changes I did:

(1)While reading the image (function read_pdf_info), the annotations
of this image are read, too, and a map with all relevant details
about the annotations is set up. (2)While writing the reference
(/Im1 Do) to the content stream (function out_image), a dummy
annotation node is created to get a correct /Annots array. Those
dummy annotations nodes are caught during module @<Flush out PDF
annotations@> and are just delete; no dummy annotation object it
written out. (3)Eventually in function epdf_write all annotation
objects of this image are copied are written out.

Another important this which happens within the function out_image,
is the calculation to the new coordinates of the annotation's /Rect.
There are no big problems if an images is scaled with the help of
<rule spec> of \pdfximage. But their are a lot of problems if
macro packages do something like \pdfliteral{.5 0 0 .5 0 0 cm}
for scaling and similar things for rotating images. To calculate
the coordinates of /Rect, pdftex must know exactly which coordinate
transformations are actually taking place. So I introduced two
new primitives: \pdfsetctm and \pdfresetctm. 

Instead of 

   \pdfximage {doc.pdf}
   \pdfliteral{q 0 1 -1 0 0 0 cm} 

macro packages should do

   \pdfximage {doc.pdf}
   \pdfsetctm 0 1 -1 0 0 0

Now pdftex does know about the current transformation matrix and
can calculate the correct coordinates of /Rect.

Since TeX does not have a function to scan floating-point numbers,
I used scan_dimen as a work around to scan the arguments of \pdfsetctm.
That's why you must write

   \pdfsetctm 0bp 1bp -1bp 0bp 0bp

so far. It should be no problem to change this in the future.

The primitives \pdfsetctm and \pdfresetctm create whatsit nodes.
When these nodes a shipped out \pdfsetctm pushed its arguments
on a stack and \pdfresetctm pop them again. When a \pdfrefximage
node is shipped out it can look at the stack to get the current
CTM and calculate a correct /Rect.

The tarball contains 2 plain TeX files and 2 LaTeX files which 
show some examples. To compile the LaTeX files you must apply
the patch pdftex.def.diff to pdftex.def.

And now comes the unpleasant part. Here is a list of major bugs
that those patches have :-(

* If an image is included several times, all copies have the same
  image number. Unfortunately I used this image number as the key
  to the map of annotations. That's why you cannot include pages
  several times, without getting false /Rects. This bug shouldn't
  be too difficult to fix.

* \pdfsetctm should only be used in horizontal mode. I have not
  idea why it chokes in vertical mode.

* There should be no material between \pdfsetctm and \pdfrefximage.
  Any material there would cause pdftex to call pdf_set_origin,
  which introduces a new CTM not taken into account by \pdfsetctm.

* LaTeX, graphicx: foolish things like
  \includegraphics[angle=90, angle=90]{doc.pdf} causes the call of
  the function pdf_set_origin and therefore don't work either. 

* There must not be a page break between \pdfsetctm and \pdfresetctm.

* There are probably more ...