On 26 Jan 2019, at 8:09 pm, Hans Hagen <j.hagen@xs4all.nl> wrote:

> PDF/A-2 and PDF/A-3 relax many of those 'may not include’s,
> which are mostly things that TeX does support.
> The optionality of /CharSet is just another such relaxation.

just wondering: do you see any technical advantage in this CharSet bit
array, other than it being an option to predict maybe font memory
allocation demands or so (which then in turn is useless as the pdf
format has many aspects that will bloat memory usage anyway)

I can envisage a possible use for having this knowledge of which glyphs

are available internally in a font subset.

PDFs are now editable, at least in Acrobat Pro.

So knowing what characters are available lets software easily determine

whether a simple edit that changes or adds characters to a text block

can simply be performed using the embedded font subset,

or whether a font substitution is needed to do the specific edit.

Of course it is preferable to not have to substitute, as this can change

the metrics, hence potentially making a noticeable change to the

visual appearance of that text block.

If you have ever tried to edit a PDF made by someone else (with TeX

or Word or …) then you should have experienced how things can

move around significantly within the same paragraph.

>> Anyway, right now the choices are a) omit /CharSet or
>> b) output a possibly-incorrect CharSet.
>
> If there was a primitive that can control this, then that would
> potentially be enough, at least for the present.
> It would allow the CharSet to be omitted with PDF/A-2,3
> but included with PDF/A-1.

in luatex it's an option

At what level?

Can it be done on a font-by-font basis? That would be ideal.

If just a command-line option when calling lualatex then that is

kind of workable.

Essentially it would require a user to have done a preflight check

and found that one of the fonts has a CharSet problem.

Then rerun with the option set, to get a valid PDF/A-2 (or 3) document.

It would be affecting all the Type-1 fonts, not just one of them.

The ability (described above) to later edit the PDF would be lost pretty

much entirely.

> This distinction would need to be documented (in pdfx.pdf say )
> so that authors can understand the issue and choose the appropriate
> package-loading option for their own circumstances.
> I’m happy to do this.

> But I’ve not yet looked at how the subsetted font is constructed.
> My thought is that the latter needs to adjust the gl_tree before it is
> used.
> As I said previously, this will be a timing issue; so I’m not confident that
> I could correctly write the necessary coding, using programming structures
> that I don’t fully understand.

i don't know about pdftex but it is something delayed to the last when
the 'combined' font resource is added as different tex fonts using the
same resource can get different entries (and width arrays) but share the
blobs

My understanding of the code in writefont.c is that the Font Descriptor

dictionary is constructed (and written) as a complete object, before the font

subset itself is constructed.

For the CharSet, the entries in gl_tree are used, based upon a list of the characters

explicitly using that font. This does *not* include implicit glyphs, such as

/grave (and perhaps /a ) with /agrave .

It was such a circumstance that initiated this conversation roughly a year ago.

I looked at solutions like writing the accent characters in white, outside the page

boundaries, as an /Artifact say. But this begets a range of difficulties, and could

potentially affect the pagination or typesetting, and can fail other accessibility checks.

I want to develop reliable means to construct documents simultaneously for both

Archivability and Accessibility.

>> (I highly doubt that Thanh has time to
>> look into this.) Sorry, but that's the reality. -k
it's probably not that complex; i also doubt if the quality of that
vector should be perfect as probably only its prensence is checked, not
its internal validity (which then would also demand checking fonts which
afaik doesn't happen in detail); and i bet that viewers ignore its
content anyway

From the veraPDF link that Reinhard provided, it seems that presence

is checked with PDF/A-1, but not accuracy.

But for PDF/A-2 and 3, there is an more detailed check for accuracy.

Perhaps true for viewers; but PDFs are becoming about *more* than just

the visual view. We want to be providing the structures required for accurate

text extraction and editing. TeX was never designed with this in mind, but

because of its programmability this is something that should be achievable.

Hans

-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------