On 1/27/2019 10:00 PM, Ross Moore wrote:
PDFs are now editable, at least in Acrobat Pro.
weren't they always, given fonts being available?
So knowing what characters are available lets software easily determine whether a simple edit that changes or adds characters to a text block can simply be performed using the embedded font subset, or whether a font substitution is needed to do the specific edit.
Of course it is preferable to not have to substitute, as this can change the metrics, hence potentially making a noticeable change to the visual appearance of that text block.
If you have ever tried to edit a PDF made by someone else (with TeX or Word or …) then you should have experienced how things can move around significantly within the same paragraph.
i never edit pdf documents (ok, i remember that once i had to strip stuff in order to get a logo, but not adding something) imo editing a pdf makes no sense (and reflow even less) ... also, with respect to fonts, editing assumes all glyphs being present and with open type fonts one also enters a feature mess and gsub/gpos are not embedded alse, editing contradicts archiving
At what level?
primitive (one can omit cidsets and charsets) and i added a setter at the lua end (there was already one for cidsets)
Can it be done on a font-by-font basis? That would be ideal.
hm, in principle that can be implemented but i don't think that will happen (also, when one uses so called wide fonts there are no charsets because the type 1 becomes a sort of simple opentype)
If just a command-line option when calling lualatex then that is kind of workable. Essentially it would require a user to have done a preflight check and found that one of the fonts has a CharSet problem. Then rerun with the option set, to get a valid PDF/A-2 (or 3) document.
same control as pdftex: primitives
It would be affecting all the Type-1 fonts, not just one of them. The ability (described above) to later edit the PDF would be lost pretty much entirely.
how many people will keep using type 1 fonts ... (i only use a few that i only have in type 1, like optima nova, but even that one is used as wide font)
My understanding of the code in writefont.c is that the Font Descriptor dictionary is constructed (and written) as a complete object, before the font subset itself is constructed. For the CharSet, the entries in gl_tree are used, based upon a list of the characters explicitly using that font. This does *not* include implicit glyphs, such as /grave (and perhaps /a ) with /agrave .
which is why you use tounicode -)
It was such a circumstance that initiated this conversation roughly a year ago. I looked at solutions like writing the accent characters in white, outside the page boundaries, as an /Artifact say. But this begets a range of difficulties, and could potentially affect the pagination or typesetting, and can fail other accessibility checks. I want to develop reliable means to construct documents simultaneously for both Archivability and Accessibility.
in luatex (and probably also in pdftex) the font id (instance often is the font in the text stream and as it has specific widths it gets disctionaty with a few properties, referring to a parent font that is shared; the question is what pdftex does when there are more than 255 glyphs referred to from one type 1 font but i guess that this doesn't happen often in pdftex usage (one can try to include the full ec, texnansi, qx, some vietnamese pagella fonts and see what happens)
From the veraPDF link that Reinhard provided, it seems that presence is checked with PDF/A-1, but not accuracy.
my impression is that fonts are never validated (there are all kind of properties that one need to keep with font objects so that is a special kind of check ... viewers of course can complain but even then, i had cases where acrobat complained and showed nothing while mypdf did and vise versa (private dict stuff and so)
But for PDF/A-2 and 3, there is an more detailed check for accuracy.
Perhaps true for viewers; but PDFs are becoming about *more* than just the visual view. We want to be providing the structures required for accurate text extraction and editing. TeX was never designed with this in mind, but because of its programmability this is something that should be achievable.
sure, and it has been ... but even with that embedding a structured source for processing to me makes more sense (which of course is not what publisheres want) (more accurate would be: this is not what maro packages and the tex way of entering content is designed for nor what users have in mind; tex itself can do pretty much anything) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------