3. Regression testing in pdfTeX: A change that only aims at e.g. performance enhancements should not alter the document. Many other changes of course will change it.
i think that you must define different levels of similarity: - pure text: in that case a bitmap as already discussed is needed - functionality (annotations and such): that need to take place at the pdf level, i.e. filtering resources and descriptions and compare them (e.g. annotation names, rectangles, etc) - font resources i can imagine that your test of text similarity is run with disabled interactive features the second one could be an add=on for pdftex: a special log mode, where pdftex writes a file with all annotations (name, page, rectangle, maybe also the while dict) and a second one which lists all the used fonts, encoding files, map lines and glyphs (encoding subset) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------