Hi Karl,

On 26 Jan 2019, at 9:20 am, Karl Berry <karl@freefriends.org> wrote:

[ https://mailman.ntg.nl/pipermail/ntg-pdftex/2018-June/004251.html ]

Can anyone tell me how this issue has been resolved, if at all ?

I removed the inclusion of /CharSet in the output in the pdftex source
(back in July). We did not compile new TL binaries with that change. As
things stand, it will be included in TL19. I imagine Christian has
compiled new MiKTeX binaries since then. Hence the difference in behavior.

has provided that produces a œôòümissing CharSetœôòý error when validated
with Adobeœôòùs Preflight for PDF/A-1b.

That "error" from preflight contradicts the previous information, which
was that CharSet was optional, but had to be completely correct if it
was included.

It seems that the “optionality” applies to PDF/A-2 and PDF/A-3 
but  /CharSet  is required in  PDF/A-1b  (presumably PDF/A-1a also).

Since  PDF/A-1b  is what most people try to create when instructed
that they need to submit their thesis as  PDF/A , then it is wrong to
always have the CharSet omitted.

Inclusion/omission should be controllable by a primitive, allowing a user
or package-writer to select the proper mode.


Here’s how it is worded within the  ISO 19005-2  spec. document:

If the FontDescriptor dictionary of an embedded Type 1 font contains a CharSet string, then it shall list the character names of all glyphs present in the font program, regardless of whether a glyph in the font is referenced or used by the PDF or not. 

NOTE 2 The above requirement makes normative the statements in ISO 32000-1:2008, 9.8. 


For subsetted fonts, this is not 100% clear whether it means:
 1.  all glyphs in the unsubsetted font program
      (something which should be easy to achieve)
or 
 2.  all glyphs within the font program which is the subsetted font.


There is no way to make it completely correct (precomposed
accent agony, etc.). Therefore the idea was to omit it.

It is surely just the latter (2.) that is harder to achieve, since current methods
look at just the names of referenced glyphs, and not whether any of these
reference any further glyphs. 
Yet the font subset program being built must include the subroutines for those 
extra referenced glyphs, so at some point the required tests are being performed.
That information has to be passed back to the listing of glyphs used to build the CharSet. 
Thus it is really a timing problem, not anything that is inherently impossible to achieve.


-k

So can you please consider introducing a primitive, having a numerical value;
e.g.
  0  =  omit the CharSet
  1  =  build  CharSet as at present
  2  =  build CharSet from *all* glyph names in the unsubset font

Then we can test whether option 2 gives valid documents in all PDF/A levels
and flavours; if so it can become the default.

If option 2 doesn’t work then the coding for option 1 will need to be enhanced.
Then there could be an experimental option 3, that will become the default 
once perfected.


All the best,

Ross