[pdftex-Bugs][4321] Illegal entry in bfrange block in ToUnicode CMap

19 Dec 2010

      Bugs item #4321, was opened at 2010-11-25 14:21
Status: Open
Priority: 3
Submitted By: Heiko Oberdiek (oberdiek)
Assigned to: Nobody (None)
Summary: Illegal entry in bfrange block in ToUnicode CMap 
Category: None
Group: None
Resolution: Accepted

Initial Comment:
Hello,

pdfTeX complains
  Error: Illegal entry in bfrange block in ToUnicode CMap
for valid cmap entries, when a PDF file is included.
The CMap entries are, for example:

1 beginbfrange
<0041><0041><0041>
endbfrange

The error disappears in case of

1 beginbfrange
<41><41><0041>
endbfrange

The error is in function CharCodeToUnicode::parseCMap1 in file
libs/xpdf-3.02/xpdf/CharCodeToUnicode.cc

In case of poppler the problem is already reported with patch:

http://lists.freedesktop.org/archives/poppler-bugs/2010-April/004931.html

The appended test file can be processed by "pdftex --ini", "pdftex" or
"pdflatex".

Yours sincerely
  Heiko

----------------------------------------------------------------------
...
Comment By: The Thanh Han (hanthethanh)
Date: 2010-12-19 03:44
Message:
fixed in svn stable

----------------------------------------------------------------------

Comment By: Taco Hoekwater (taco)
Date: 2010-11-26 07:19

Message:
ToUnicode is a little odd because it uses CMap syntax with a
few extra limitations that are only in the pdf reference,
and these seem to come from a really weird bit of Acroread
implementation code.

I have not looked at the input closely, so I could be
missing the point a little, but this could be the problem:

The hex number scanning in AR is closely related to the
begincodespacerange ... endcodespacerange block. If the code
space range is one byte, then all hex numbers have to be
specified in two digits, and if the code space range is two
bytes, then all further hex numbers have to be given in four
digits.

----------------------------------------------------------------------

Comment By: The Thanh Han (hanthethanh)
Date: 2010-11-26 02:32

Message:
we would apply the mentioned patch from poppler.

Regards the case

<0041><0041><0042>

it works fine with Preview (osx) and acrobat 9, so I think it's a browser
issue.

Thanh

----------------------------------------------------------------------

Comment By: Heiko Oberdiek (oberdiek)
Date: 2010-11-25 14:48

Message:
Hello,

I have made further experiments by
replacing the last <0041> by <0042>.
The "A" of the input file should then get
converted to "B" by copy&paste.
This works for the line
  <41><41><0042>
with AR7/Linux,
however it fails ("A" instead of "B") in case
of
  <0041><0041><0042>
The PDF specification shows in section
"5.9 Extraction of Text Content" entries
with four hexadecimal digits.

Can someone bring some light to this obscurity?

Yours sincerely
  Heiko

----------------------------------------------------------------------

You can respond by visiting: 
http://sarovar.org/tracker/?func=detail&atid=493&aid=4321&group_id=106

pdftex-bugs＠sarovar.org

tags

participants (1)