[NTG-pdftex] Error with \pdfglyphtounicode when surrogatesare involved.

Ross Moore ross.moore at mq.edu.au
Wed May 31 10:03:37 CEST 2017


Hi Akira,

On May 31, 2017, at 2:59 PM, Akira Kakuto <kakuto at fuk.kindai.ac.jp<mailto:kakuto at fuk.kindai.ac.jp>> wrote:

Hi Karl,

I realize you're reporting a separate bug, that the value gets
misinterpreted

\pdfglyphtounicode{Z}{D835DC81}

<5A> <36E537DC81>

I confirmed that Ross's
\pdfglyphtounicode{Z}{D835 DC81}
with a space works ok.

In the case of
\pdfglyphtounicode{Z}{D835DC81}
I encountered an assertion error because
long code = 0XD835DC81 < 0 in my case, where
sizeof(long) = 4.

Ross obtained erroneously vh = 0X36E537, vl = 0XDC81
because long code = 0XD835DC81 > 0, if sizeof(long) = 8.

Is assert(code >= 0 && code <= 0X10FFFF) OK or not OK?

I'm thinking this is OK, *provided* spaces are used to separate the codes,
when multiple glyphs are required.

Otherwise there should be just a single Unicode point, and the
allowable range for this is    <= 0X10FFFF  .
Indeed the top end of this  ( 0X100000 upwards ) is for “Private Use” only.


My understanding of these pieces of code:

    for (i = 0; i < l; i++) {
        if (p[i] == ' ')
            valid_unistr = 2;   /* if a space occurs we treat this entry as a string */


    if (valid_unistr == 2) {    /* a string with space(s) */
        /* copy p to buf2, ignoring spaces */
        for (q = buf2; *p != 0; p++)
            if (*p != ' ')
                *q++ = *p;
        *q = 0;
        gu->code = UNI_STRING;
        gu->unicode_seq = xstrdup(buf2);

  … is that blocks of 4-6 hex digits are just copied verbatim, without calling   check_unicode_value
so that  assert  is never actually encountered.

Do you agree with this interpretation?



(from tounicode.c)
static char *utf16be_str(long code)
{
  static char buf[SMALL_BUF_SIZE];
  long v;
  unsigned vh, vl;

  assert(code >= 0);

  if (code <= 0xFFFF)
      sprintf(buf, "%04lX", code);
  else {
      v = code - 0x10000;
      vh = v / 0x400 + 0xD800;
      vl = v % 0x400 + 0xDC00;
      sprintf(buf, "%04X%04X", vh, vl);
  }
  return buf;
}

Best,
Akira


Cheers,

Ross


Dr Ross Moore
Mathematics Dept | 12 Wally’s Walk, 734
Macquarie University, NSW 2109, Australia
T: +61 2 9850 8955  |  F: +61 2 9850 8114
M:+61 407 288 255  |  E: ross.moore at mq.edu.au<mailto:ross.moore at mq.edu.au>

http://www.maths.mq.edu.au


[cid:image001.png at 01D030BE.D37A46F0]<http://mq.edu.au/>


CRICOS Provider Number 00002J. Think before you print.
Please consider the environment before printing this email.

This message is intended for the addressee named and may
contain confidential information. If you are not the intended
recipient, please delete it and notify the sender. Views expressed
in this message are those of the individual sender, and are not
necessarily the views of Macquarie University.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.ntg.nl/pipermail/ntg-pdftex/attachments/20170531/ca3a29f3/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 4605 bytes
Desc: image001.png
URL: <http://mailman.ntg.nl/pipermail/ntg-pdftex/attachments/20170531/ca3a29f3/attachment.png>


More information about the ntg-pdftex mailing list