Hi Karl,

Thanks for looking at this.

On Jun 13, 2017, at 9:10 AM, Karl Berry <karl at freefriends.org<mailto:karl at freefriends.org>> wrote:

   rm> ... test the full name
   (including œפער.œפעשs) first, for a datbase entry.  If found, use it.
   Otherwise, try again using just the prefix (as at present).

That surely sounds sensible.

   Or in case a name is multiply qualified; e.g.,
             delta.sc.ipa         (occurs in  cmu-tipx.enc )   also
             omega.sc.ipa   q.sc.ipa   f.sc.ipa
   then drop off the qualifications from the end.
   So test in order:   delta.sc.ipa   delta.sc   delta


Thanh, can you confirm that we should go ahead with this plan?

The point is that Unicode is not about glyphs, but characters.
This is stated very clearly in the Unicode documentation, and on numerous websites.
Then ‘character’ should be interpreted as 'how a (collection of) glyph(s) is used’.

Thus 'a' and 'a.sc’ are used differently; the latter usually for some form of emphasis,
as in section headings or running headers, rather than the body text.
So it’s not just a matter of a different font style for these words.
It’s quite reasonable for Copy/Paste and screen- or Braille-readers to be able to
detect this difference, via the  /ToUnicode  map.

Using just the glyph name 'a' is fine as a fall-back, but it should be up to the document
author or package-writer to enrich the full name with a more descriptive Unicode point,
when this is appropriate. At present  pdftex  prevents it, when using  \pdfglyphtounicode .

The only alternative, so far as I can see, is to provide an explicit CMap file, after having
temporarily set  \pdfgentounicode=0  when the font is loaded. This is hard to get correct
and to find the right place to hook-in the relevant coding.

Actually, another possibility is editing both the font and its encoding file - - something that
I’m sure we don’t want people doing.


Hope this helps.



