[NTG-pdftex] Patch 8BitString (was: Bug: <hex> string to () string)

Heiko Oberdiek oberdiek@uni-freiburg.de
Tue, 29 Jul 2003 22:20:34 +0200


--0OAP2g/MAC+5xKAE
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Hello,

and the fix for the image color problem that is a write problem
of strings in PDF syntax:

Patch 8BitString

I wanted to include pdf slides from:
  http://ais.informatik.uni-freiburg.de/lehre/ss03/ki/slides/01-intro.pdf
by pdfTeX, but the colors in some of the images are terrible wrong.

Driver file:

% test.tex
\nopagenumbers
\pdfcompresslevel=0
\pdfximage width 297bp height 210bp page 14 {01-intro.pdf}
% similar page 9, ...
\pdfrefximage\pdflastximage
\bye

Analysis:

                    01-intro.pdf / test.pdf

/Page object:       107 0  /  3 0
It references via /XObject the /Image object:
/Image object:      110 0  /  8 0
With
/ColorSpace object: 111 0  /  7 0

Start of object 111 0 of 01-intro.pdf:

111 0 obj
[/Indexed
/DeviceRGB
255
<0402070402300402500402700402840402EC3502066E021585020A9A02159C02
...

Start of object 7 0 of test.pdf with one line break for readability:

7 0 obj
[/Indexed/DeviceRGB 255
(\004\002\007\004\0020\004\002P\004\002p\004\002\37777777604
\004\002\377777777545\002\006n\002\025\37777777605\002...

==> The translation of 8-bit characters that have set it eighth bit
    get terrible wrong:
      hex(84) --> string(\37777777604)
      hex(EC) --> string(\37777777754)
    It seems that the octal number is written as 32 bit number instead
    of a 8 bit number.


Another test file scenario:

% include.tex:
\pdfinfo{/Title<0204840204>}
\nopagenumbers
\hbox{}
\bye

% test.tex:
\pdfcompresslevel=0
\pdfximage width 210bp height 297bp {test.pdf}
\pdfrefximage\pdflastximage
\nopagenumbers
\bye

==> /Title (\002\004\37777777604\002\004)

After the following patch:

==> /Title (\002\004\204\002\004)

Problem detected in pdftoepdf.cc:

static void copyObject(Object *obj)
{
    ...
    int  i, l, c;
    ...
    else if (obj->isString()) {
        s = obj->getString();
        p = s->getCString();
        l = s->getLength();
        if (strlen(p) == (unsigned int)l) {
            pdf_puts("(");
            for (; *p != 0; p++) {
% original:     c = *p;
% fix:          c = (unsigned char)*p;
                if (c == '(' || c == ')' || c == '\\')
                    pdf_printf("\\%c", c);
                else if (c < 0x20 || c > 0x7F)
                    pdf_printf("\\%03o", c);
                else
                    pdfout(c);
            }
            pdf_puts(")");
        }

The sign extension of *p with type "signed char" (8 bit)
to c with type "signed int" (32 bit) can explain the effect.

Alternatively
  pdf_printf("\\%03o", c);
can be changed to
  pdf_printf("\\%03o", (unsigned char)c);

Patch files based on 2003/07/29 v1.11a:
  pdftoepdf.cc.diff   for TeX/texk/web2c/pdftexdir/pdftoepdf.cc

Yours sincerely
  Heiko <oberdiek@uni-freiburg.de>
-- 
--0OAP2g/MAC+5xKAE
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="pdftoepdf.cc.diff"

*** pdftoepdf.cc.org	Tue Jul 29 21:41:13 2003
--- pdftoepdf.cc	Tue Jul 29 22:06:34 2003
***************
*** 559,565 ****
          if (strlen(p) == (unsigned int)l) {
              pdf_puts("(");
              for (; *p != 0; p++) {
!                 c = *p;
                  if (c == '(' || c == ')' || c == '\\')
                      pdf_printf("\\%c", c);
                  else if (c < 0x20 || c > 0x7F)
--- 559,565 ----
          if (strlen(p) == (unsigned int)l) {
              pdf_puts("(");
              for (; *p != 0; p++) {
!                 c = (unsigned char)*p;
                  if (c == '(' || c == ')' || c == '\\')
                      pdf_printf("\\%c", c);
                  else if (c < 0x20 || c > 0x7F)

--0OAP2g/MAC+5xKAE--