[Fwd: Producing Tagged PDF or PDF/A from LaTeX (ISO 19005-1)]

25 Oct 2005

      -------- Original-Nachricht --------
Betreff: Producing Tagged PDF or PDF/A from LaTeX (ISO 19005-1)
Datum: 25 Oct 2005 13:36:19 GMT
Von: n05W43+mgk25@cl.cam.ac.uk (Markus Kuhn)
Organisation: University of Cambridge, England
Newsgruppen: comp.text.tex

Is there any solution on the horizon for turning a LaTeX document into
a "Tagged PDF" file, that is a PDF 1.4 file written in such a way
that the underlying Unicode plaintext in it can be recovered smoothly
(for easier searching, cut&paste, NLP parsing, Braille, speech
synthesis, etc.).

Background:

The new international standard ISO 19005-1:2005 defines an
"electronic document file format for long-term preservation" called
PDF/A-1. This is basically a subset of PDF 1.4, with lots of nasty and
dangerous stuff (JavaScript, external references, missing fonts,
encryption, etc.) removed and various historic ambiguities in the
PDF spec clarified.

It was mainly developed to make PDF useful for legal-deposit purposes,
All very good and commendable for anyone using PDF in electronic
publishing and archival applications.

However:

Full compliance with the PDF/A-1 format requires that "Tagged PDF"
is used, such that the underlying plaintext remains accessible for
further processing.

How could I do that from LaTeX?

Markus

-- 
Markus Kuhn, Computer Laboratory, University of Cambridge
http://www.cl.cam.ac.uk/~mgk25/ || CB3 0FD, Great Britain

Rolf Niepraschk

tags

participants (1)