[NTG-pdftex] [ pdftex-Patches-580 ] Patch to make ToUnicode for Type1 fonts

noreply at sarovar.org noreply at sarovar.org
Fri Oct 6 13:34:01 CEST 2006


Patches item #580, was opened at 2006-07-14 20:57
You can respond by visiting: 
http://sarovar.org/tracker/?func=detail&atid=495&aid=580&group_id=106

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: The Thanh Han (hanthethanh)
Assigned to: Nobody (None)
Summary: Patch to make ToUnicode for Type1 fonts

Initial Comment:
This is a patch to pdftex so that it can create
ToUnicode entries for Type1
fonts. The main purpose is to make ligatures and some
other glyphs like
smallcap letters or oldstyle digits from OpenType fonts
searchable. This
patch also contains a minor fix that allows use of
fonts without embedding,
for example MinionPro or MyriadPro (which are
distributed with Acrobat
Reader >= 7.0 but from their use is restricted with
Acrobat Reader only).

How to apply:
~~~~~~~~~~~~~
- this patch applies to the pristine
pdftex-1.40.0-beta-20060213 sources
  only; if you have applied another patch(es) to the
sources, please
  discard them and start from the fresh ones.

- how to apply:

,--------
| cd /path/to/pdftex-1.40.0-beta-20060213/src
| cat /path/to/the/patch | patch -p1
| ./configure
| cd texk/web2c
| make pdfetex
`--------

If you want to be careful, try the patch with the
option '--dry-run' first to
see whether the patch can be applied without problems.


Usage:
~~~~~~
add the following lines into your document, somewhere
at the beginning:

,--------
| \input glyphtounicode.tex
| \pdfgentounicode=1
`--------

Customization:
~~~~~~~~~~~~~
If pdftex cannot generate the right ToUnicode value for
some glyphs
(probably because the glyph name is not ``known'' to
pdftex), it's possible
to add further entries so pdftex can learn how to
generate unicode for such
``unknown'' glyphs.

The syntax is simple:

\pdfglyphtounicode{<glyph-name>}{<unicode-value>}

Example:

\pdfglyphtounicode{A}{0041}

says that glyph 'A' has its unicode U+0041

The entries in glyphtounicode.tex cover Adobe Glyph
List (glyphlist.txt
version 2.0) and some addtional glyphs
(texglyphlist.txt version 2.33,
coming from from lcdf-typetools), plus some additional
entries for
ligatures.

If some glyph name cannot be found, pdftex does some
simple name
translations:

- remove any ".xxx" suffix from glyph name, where "xxx"
is a string
  consisting of alphabetic characters. For example
"A.sc" => "A"

- remove suffix like "small", "oldstyle", "inferior"
and "superior" from
  glyph name. For example "Asmall" => "A"

The result name then is looked up again to find a unicode.

Ligatures require a special form of ToUnicode. Example:

\pdfglyphtounicode{ff}{00660066}

here '0066' is the unicode string for 'f'. Some
ligatures have their name
like 'f_f_i', in such case the command should be

\pdfglyphtounicode{f_f_i}{006600660069}

ie '_' is removed from the glyph name, and then all
letters are translated
to their unicode string.


----------------------------------------------------------------------

>Comment By: The Thanh Han (hanthethanh)
Date: 2006-10-06 11:34

Message:
Logged In: YES 
user_id=710

This is a patch to pdftex that includes:
  - a fix for bug #611
  - some changes to ToUnicode support (patch #580) to make the
    implementation follow guidelines at
   
http://partners.adobe.com/public/developer/opentype/index_glyph.html


How to apply:
~~~~~~~~~~~~~
- this patch applies to the pristine
pdftex-1.40.0-beta-20060928 sources
  only; if you have applied another patch(es) to the
sources, please
  discard them and start from the fresh ones.

- how to apply:

,--------
| cd /path/to/pdftex-1.40.0-beta-20060928/src
| cat /path/to/the/patch | patch -p1
| ./configure
| cd texk/web2c
| make pdftex
`--------

If you want to be careful, try the patch with the option
'--dry-run' first to
see whether the patch can be applied without problems.


Usage:
~~~~~~
add the following lines into your document, somewhere at the
beginning:

,--------
| \input glyphtounicode.tex
| \pdfgentounicode=1
`--------

Customization:
~~~~~~~~~~~~~
If pdftex cannot generate the right ToUnicode value for some
glyphs
(probably because the glyph name is not ``known'' to
pdftex), it's possible
to add further entries so pdftex can learn how to generate
unicode for such
``unknown'' glyphs.

The syntax is simple:

,--------
| \pdfglyphtounicode{<glyph-name>}{<unicode-value>}
`--------

Example:

,--------
| \pdfglyphtounicode{A}{0041}
`--------

says that glyph 'A' has its unicode U+0041. 

\pdfglyphtounicode requires that the second parameter
consists of uppercase
hexadecimal digits (0..9, A..F) and spaces. If this is not
the case the
entry is simply discarded (with a warning). Later entries
overwrite
previous entries with the same name (1st arg).

The entries in glyphtounicode.tex cover:
  - glyphlist.txt       (Adobe Glyph List v2.0)
  - zapfdingbats.txt    (ITC Zapf Dingbats Glyph List v2.0)
  - texglyphlist.txt    (lcdf-typetools texglyphlist.txt v2.33)
  - additional.tex      (additional entries)

Ligatures require a special form of ToUnicode. Example:

,--------
| \pdfglyphtounicode{ff}{00660066}
`--------

here '0066' is the unicode string for 'f'. Spaces are
ignored in the second
parameter of \pdfglyphtounicode, hence it is possible to
write the above
command as

,--------
| \pdfglyphtounicode{ff}{0066 0066}
`--------

which is easier to read and understand IMO.


----------------------------------------------------------------------

Comment By: The Thanh Han (hanthethanh)
Date: 2006-07-19 06:08

Message:
Logged In: YES 
user_id=710

patch updated to fix a bug reported by Dohyun Kim

----------------------------------------------------------------------

Comment By: The Thanh Han (hanthethanh)
Date: 2006-07-16 11:19

Message:
Logged In: YES 
user_id=710

patch updated by a bug fix from Akira

----------------------------------------------------------------------

You can respond by visiting: 
http://sarovar.org/tracker/?func=detail&atid=495&aid=580&group_id=106


More information about the ntg-pdftex mailing list