[NTG-pdftex] Incomplete CharSet causes failure with PDF/A validation

Karl Berry karl at freefriends.org
Thu Jun 14 23:27:26 CEST 2018


Ross and all - back on your mail about /CharSet from two years ago.

    Date: Sat, 11 Jun 2016 00:15:05 +0000
    From: Ross Moore <ross.moore at mq.edu.au>

    [ https://mailman.ntg.nl/pipermail/ntg-pdftex/2016-June/004087.html ]

As far as I can tell, the problem as reported relates to the seac
operator. Heiko, Thanh, Ross, anyone, up for looking into the code to
get the seac referents into the output /CharSet list? Not something I am
familiar with, so it would take me a while. (More below.)

For example, this pdftex -ini file shows the problem, unrelated to the
pdf/x or latex:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\pdfoutput=1 \catcode`\{=1 \catcode`\}=2
\pdfcompresslevel=0 \pdfobjcompresslevel=0
\pdfglyphtounicode{aacute}{00E1}\pdfgentounicode=1
\font\b = fver8t \b % ecrm1000
\hsize=5pt
\hfil\char225 % aacute
\end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
$ pdftex -ini foo.tex
..
$ fgrep -a /CharSet try.pdf
/CharSet (/aacute)

Whereas the correct output should also include /a and /acute (as it does
with gs, which has code to handle the seac pieces):
/CharSet(/a/aacute/acute)

Looking at the definition for /aacute (t1disasm <fver8a.pfb):
/aacute {
	50 596 hsbw
	192 170 0 97 194 seac
	} ND
where 97 is "a" and 194 is "acute". Just have to insert those into the
output list, presumably.

The code to output /CharSet from the glyph tree is in pdftexdir/writefont.c:
            pdf_puts("/CharSet (");
            for (glyph = (char *) avl_t_first(&t, fd->gl_tree); glyph != NULL;
                 glyph = (char *) avl_t_next(&t))
                pdf_printf("/%s", glyph);

And the code to handle seac is in writet1.c:
            case CS_SEAC:
                a1 = cc_get(3);
                a2 = cc_get(4);
                cc_clear();
                mark_cs(standard_glyph_names[a1]);
                mark_cs(standard_glyph_names[a2]);
                break;

"Just" have to get these pieces together, which doesn't seem like it
should be too hard ... ?


By the way, I checked a few other fonts. For EC (ecrm1000), Latin Modern
(ec-lmr10), and txfonts (t1xr) (mentioned in
tex.stackexchange.com/questions/81927), seac is not used. This is a
reasonable choice for font implementors, as seac is deprecated all over
the place, as it assumes AdobeStandardEncoding etc.

dvips|ps2pdf does not output the /a and /acute for those fonts either;
presumably Adobe programs don't either. This seems correct, since the /a
and /acute character definitions are not in fact used in those cases.
Hopefully that is ok with the new standards. I can't imagine a decent
way to change it. --best, karl.

P.S. Unrelated to the problem, but I noticed while looking into it ...
although at one time writet1.c in pdftex and dvips were close, Pali's
changes last year to support encodings for bitmap fonts made the two
versions very different. I doubt they can reasonably be merged again. :(.


More information about the ntg-pdftex mailing list