[Fwd: Producing Tagged PDF or PDF/A from LaTeX (ISO 19005-1)]
-------- Original-Nachricht -------- Betreff: Producing Tagged PDF or PDF/A from LaTeX (ISO 19005-1) Datum: 25 Oct 2005 13:36:19 GMT Von: n05W43+mgk25@cl.cam.ac.uk (Markus Kuhn) Organisation: University of Cambridge, England Newsgruppen: comp.text.tex Is there any solution on the horizon for turning a LaTeX document into a "Tagged PDF" file, that is a PDF 1.4 file written in such a way that the underlying Unicode plaintext in it can be recovered smoothly (for easier searching, cut&paste, NLP parsing, Braille, speech synthesis, etc.). Background: The new international standard ISO 19005-1:2005 defines an "electronic document file format for long-term preservation" called PDF/A-1. This is basically a subset of PDF 1.4, with lots of nasty and dangerous stuff (JavaScript, external references, missing fonts, encryption, etc.) removed and various historic ambiguities in the PDF spec clarified. It was mainly developed to make PDF useful for legal-deposit purposes, All very good and commendable for anyone using PDF in electronic publishing and archival applications. However: Full compliance with the PDF/A-1 format requires that "Tagged PDF" is used, such that the underlying plaintext remains accessible for further processing. How could I do that from LaTeX? Markus -- Markus Kuhn, Computer Laboratory, University of Cambridge http://www.cl.cam.ac.uk/~mgk25/ || CB3 0FD, Great Britain
participants (1)
-
Rolf Niepraschk