Re: [NTG-pdftex] Incomplete CharSet causes failure with PDF/A validation

26 Jan 2019


      On 1/26/2019 12:28 AM, Ross Moore wrote:
...
Hi Karl,
...
On 26 Jan 2019, at 10:01 am, Karl Berry mailto:karl@freefriends.org> wrote:
   If the FontDescriptor dictionary of an embedded Type 1 font contains
   a CharSet string, then
I see nothing in that wording that implies CharSet is anything but
entirely optional.
That wording is for PDF/A-2, not for PDF/A-1.
The PDF doc from MikTeX, which alerted me to this, does *not* show
the CharSet error when validated for PDF/A-2 or PDF/A-3.
It *does* show the error for PDF/A-1 validation.
(I’ll copy you my response to the author, in a separate email.)
There are many ways in which PDF/A-1 is stricter than later versions.
See here:   (page 3)
https://www.pdfa.org/wp-content/until2016_uploads/2011/06/19005-1_FAQ.pdf
PDF/A-1 files must include:
• Embedded fonts
• Device-independent color
• XMP metadata
PDF/A-1 files may not include:
• Encryption
• LZW Compression
• Embedded files
• External content references
• PDF Transparency
• Multi-media
• JavaScript
PDF/A-2 and PDF/A-3 relax many of those 'may not include’s,
which are mostly things that TeX does support.
The optionality of /CharSet is just another such relaxation.
just wondering: do you see any technical advantage in this CharSet bit 
array, other than it being an option to predict maybe font memory 
allocation demands or so (which then in turn is useless as the pdf 
format has many aspects that will bloat memory usage anyway)
...
...
Anyway, right now the choices are a) omit /CharSet or
b) output a possibly-incorrect CharSet.
If there was a primitive that can control this, then that would
potentially be enough, at least for the present.
It would allow the CharSet to be omitted with PDF/A-2,3
but included with PDF/A-1.
in luatex it's an option
...
This distinction would need to be documented (in  pdfx.pdf  say )
so that authors can understand the issue and choose the appropriate
package-loading option for their own circumstances.
I’m happy to do this.
...
If you want to have a third option c) <something else>, you (or someone)
will need to send me a patch.
I’ve looked at the coding in  writefont.c  for how  gl_tree  is set and 
used.
But I’ve not yet looked at how the subsetted font is constructed.
My thought is that the latter needs to adjust the  gl_tree  before it is 
used.
As I said previously, this will be a timing issue; so I’m not confident that
I could correctly write the necessary coding, using programming structures
that I don’t fully understand.
i don't know about pdftex but it is something delayed to the last when 
the 'combined' font resource is added as different tex fonts using the 
same resource can get different entries (and width arrays) but share the 
blobs
...
...
(I highly doubt that Thanh has time to
look into this.) Sorry, but that's the reality. -k
it's probably not that complex; i also doubt if the quality of that 
vector should be perfect as probably only its prensence is checked, not 
its internal validity (which then would also demand checking fonts which 
afaik doesn't happen in detail); and i bet that viewers ignore its 
content anyway
Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------