accessing glyphs in the private area

newer
handling an external sag file as a...

Ulrike Fischer

30 Sep 2018 30 Sep '18

2:08 p.m.

The font Coelacanth (on CTAN) has glyphs in the private area. Between 2/2017 (luaotfload in texlive 2017) and now the storing and accessing of this glyphs has changed. In the lua of the font of 2017 I find e.g. [62860]={ ["boundingbox"]=165, ["index"]=2622, ["unicode"]=62860, ["width"]=523, and the glyph can be accessed with \Uchar62860 In the current lua I now find [983910]={ ["boundingbox"]=195, ["index"]=2622, ["unicode"]=62860, ["width"]=523, }, and \Uchar62860 not longer works, one has to use \Uchar983910. Is this change intentional? How is one supposed to access such chars? The manual says about \Uchar that it "expands to the associated Unicode character." but this seems no longer to be true. A context example to test is \starttext \font\test={name:Coelacanth:mode=node;script=latn;language=DFLT;+tlig;} \test 1.: \Uchar62860 2.: \Uchar983910 \stoptext The question was triggered by this tex.sx question https://tex.stackexchange.com/questions/453224/using-glyphs-in-the-corporate... https://github.com/u-fischer/luaotfload/issues/7 -- Ulrike Fischer https://www.troubleshooting-tex.de/

Show replies by date

Hans Hagen

1 Oct 1 Oct

2:20 a.m.

On 9/30/2018 10:08 PM, Ulrike Fischer wrote:

...

The font Coelacanth (on CTAN) has glyphs in the private area.

Between 2/2017 (luaotfload in texlive 2017) and now the storing and accessing of this glyphs has changed.

In the lua of the font of 2017 I find e.g.

[62860]={ ["boundingbox"]=165, ["index"]=2622, ["unicode"]=62860, ["width"]=523,

and the glyph can be accessed with \Uchar62860

In the current lua I now find

[983910]={ ["boundingbox"]=195, ["index"]=2622, ["unicode"]=62860, ["width"]=523, },

and \Uchar62860 not longer works, one has to use \Uchar983910.

Is this change intentional? How is one supposed to access such chars? The manual says about \Uchar that it "expands to the associated Unicode character." but this seems no longer to be true.

A context example to test is

\starttext \font\test={name:Coelacanth:mode=node;script=latn;language=DFLT;+tlig;} \test 1.: \Uchar62860

2.: \Uchar983910

\stoptext

\Uchar expands to the character in the font, so to whatever sits in that slot ... in fact, fonts in luatex are not that different from traditional tex: slot 123 can be anything but it happens that we use unicode in the fontloader .. anyway, the problem, with these private areas is that they are also used by the loader (and context) so in order to avoid clashes we move all private chars in the font to a dedicated private range in your case the glyphs have no real useful names so basically i wonder what their use it (are they meant for direct access?) you can define \def\byindex#1{\ctxlua{ for k, v in pairs(fonts.hashes.identifiers[true].characters) do if v.index == #1 then tex.print(utf.char(k)) break end end }} {\definedfont[Coelacanth] test \byindex{\number"00A33}} I can remap those privates to a normalized private name, like P0F581 but it depends on how bloated fonts become that have lots of privates. In that case you can have: \def\byname#1{\ctxlua{ for k, v in pairs(fonts.hashes.identifiers[true].shared.rawdata.descriptions) do if v.name == "#1" then tex.print(utf.char(k)) break end end }} {\definedfont[Coelacanth] test \byname {P0F581}} (btw, This code is not for context users! They have other means; this is typically stuff that differs per macro package. One might for instance make a list per font with meaningfull names or so that can be accessed in a more friendly way.) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------

Ulrike Fischer

3:42 a.m.

Am Mon, 1 Oct 2018 10:20:07 +0200 schrieb Hans Hagen:

...

anyway, the problem, with these private areas is that they are also used by the loader (and context) so in order to avoid clashes we move all private chars in the font to a dedicated private range

This basically means that for every document and package which uses the generic fontloader the access to chars in the private area with \char is now broken in luatex (in xetex it still works fine). I just got from Claudio Beccari (which seem to have complained to Luigi) a bug report that the libertine fonts no longer show some of the keyboard key glyphs due to the same problem. Can you tell me when this change happend? Perhaps I can build an older fontloader as a fall back.

...

in your case the glyphs have no real useful names so basically i wonder what their use it (are they meant for direct access?)

The question on tex.sx claimed that it has the name uniF58C. I never used the font and don't know how Therese accessed the glyphs before, but the libertine package has long lists of mappings like this: \DeclareTextGlyphY{LinBiolinum_K}{uniE18C}{57740} How do context users access such glyphs? Why is there no problem?

...

you can define

\def\byindex#1{\ctxlua{ for k, v in pairs(fonts.hashes.identifiers[true].characters) do if v.index == #1 then tex.print(utf.char(k)) break end end }}

{\definedfont[Coelacanth] test \byindex{\number"00A33}}

I don't see a use of accessing this glyphs by index - index positions can change if the font is updated. This can only be a last resort for glyphs without unicode position. The only sensible access is by unicode number (which works).

...

I can remap those privates to a normalized private name, like P0F581 but it depends on how bloated fonts become that have lots of privates.

...

In that case you can have:

\def\byname#1{\ctxlua{ for k, v in pairs(fonts.hashes.identifiers[true].shared.rawdata.descriptions) do if v.name == "#1" then tex.print(utf.char(k)) break end end }}

{\definedfont[Coelacanth] test \byname {P0F581}}

It would at least mean that not the whole characters list must be searched. And we could create a documented and stable access command. -- Ulrike Fischer http://www.troubleshooting-tex.de/

luigi scarso

3:53 a.m.

On Mon, Oct 1, 2018 at 11:43 AM Ulrike Fischer wrote:

...

I just got from Claudio Beccari (which seem to have complained to Luigi)

hm, not a complain, a simple "bug report". I always try to answer directly to Claudio when/if I can, but in this case, if I have understood correctly, you are now the/a maintainer of luaotfload, and for sure you can give much better answers than me. -- luigi

Hans Hagen

11:29 a.m.

On 10/1/2018 11:42 AM, Ulrike Fischer wrote:

...

Can you tell me when this change happend? Perhaps I can build an older fontloader as a fall back.

no, probably a while ago when some other clash in private area use was solved .. i'm not going to mess with the code now as 0xE000-0xEFFF is used in context for various things

...

...
in your case the glyphs have no real useful names so basically i wonder what their use it (are they meant for direct access?)

The question on tex.sx claimed that it has the name uniF58C. I never used the font and don't know how Therese accessed the glyphs before, but the libertine package has long lists of mappings like this:

\DeclareTextGlyphY{LinBiolinum_K}{uniE18C}{57740}

A funny definition ... is that access by name or number?

...

I don't see a use of accessing this glyphs by index - index positions can change if the font is updated. This can only be a last resort for glyphs without unicode position.

So can private unicodes as they are as undefined.

...

The only sensible access is by unicode number (which works).

Anyway, for generic (so not for context) I can keep these glyphs in the 0xE000-0xEFFF range for now (also the names so larger files ... actually a mess as that font has Uni and uni so who's to know). I have no clue if it clashes with some features at some point so that you can use numbers but probably those features are context specific anyway. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------

Ulrike Fischer

11:55 a.m.

Am Mon, 1 Oct 2018 19:29:40 +0200 schrieb Hans Hagen:

...

...
\DeclareTextGlyphY{LinBiolinum_K}{uniE18C}{57740}

...

A funny definition ... is that access by name or number?

By number, it uses \char at the end to get the glyph.

...

Anyway, for generic (so not for context) I can keep these glyphs in the 0xE000-0xEFFF range for now

That would be great. Thanks. What about the rest and the other PUA's? U+F000–U+F8FF, U+F0000–U+FFFFD, U+100000–U+10FFFD And when will the change be available? I will have to update luaotfload.

...

(also the names so larger files ... actually a mess as that font has Uni and uni so who's to know).

It is not only that font. Actually the libertine package broke, fontawesome broke, and Coelacanth was only used by Thérèse in the example as it is free, her real problem was with using Goudy fleurons. The private use area is there for every font to use. And quite often they document this code points and tools like fontforge show them. I don't think that it is a good idea if an application comes along and pushes the glyphs from the seats because it wants the space for itself.

...

I have no clue if it clashes with some features at some point so that you can use numbers but probably those features are context specific anyway.

For what do you reserve the space in the PUA? -- Ulrike Fischer http://www.troubleshooting-tex.de/

Hans Hagen

2:42 p.m.

On 10/1/2018 7:55 PM, Ulrike Fischer wrote:

...

Am Mon, 1 Oct 2018 19:29:40 +0200 schrieb Hans Hagen:

...
...
\DeclareTextGlyphY{LinBiolinum_K}{uniE18C}{57740}

...
A funny definition ... is that access by name or number?

By number, it uses \char at the end to get the glyph.

...
Anyway, for generic (so not for context) I can keep these glyphs in the 0xE000-0xEFFF range for now

That would be great. Thanks.

What about the rest and the other PUA's? U+F000–U+F8FF, U+F0000–U+FFFFD, U+100000–U+10FFFD

only U+F000–U+F8FF as i'm not in the mood writing code that skips over the other ones (we need code points for alternaties and such and i also need consistent room for virtual chars) ... so, if someone really needs those slots he/she should remap them somehow if you really want i can keep their names if there are names (UniXXXXXX) but that would mean way more mem for cjk fonts (then it's all or nothing for latex, as for plain generic i won't do that anyway) but i don't expect those areas to be used for useable glyphs (in fact, i would probably never rely on numbers in such private areas myself or write some plug into the loader that would remap them to areas i want them in .. i prefer glyph names)

...

And when will the change be available? I will have to update luaotfload.

dunno, when i have some more to upload (sometime this week)

...

...
(also the names so larger files ... actually a mess as that font has Uni and uni so who's to know).

It is not only that font. Actually the libertine package broke, fontawesome broke, and Coelacanth was only used by Thérèse in the example as it is free, her real problem was with using Goudy fleurons.

in context i strongly advice against using numbers instead of names

...

The private use area is there for every font to use. And quite often they document this code points and tools like fontforge show them. I don't think that it is a good idea if an application comes along and pushes the glyphs from the seats because it wants the space for itself.

well, we do need space in a valid area

...

...
I have no clue if it clashes with some features at some point so that you can use numbers but probably those features are context specific anyway.

For what do you reserve the space in the PUA? all kind of stuff (for instance we have to put substitutes, alternates etc somewhere; i also have been using it for virtual math fonst for over a decade now) and i'm definitely not going to move around all kind of already used slots around now (i might do that some day as i do have some abstract model but then i also do to need a lot of testing)

i also need the same slots in all fonts for some purposes so i need some shared private space (in fact, if i need the higher space glyphs i can always decide to use names but first i need to run into a real font using these slots in order to see what is the impact) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------

luigi scarso

10:55 p.m.

On Mon, Oct 1, 2018 at 7:56 PM Ulrike Fischer wrote:

...

For what do you reserve the space in the PUA?

http://www.pragma-ade.nl/general/manuals/fonts-mkiv.pdf page 32 of the document : As we already mentioned in a previous chapter, in ConTeXt we use Unicode internally. This also means that fonts are organized this way. By default the glyph representation of a Unicode character sits in the same slot in the glyph table. All additional glyphs, like ligatures or alternates are pushed in the private unicode space. This is why in the lists shown in the figures the ligatures have a private Unicode number. -- luigi

Ulrike Fischer

2 Oct 2 Oct

1:29 a.m.

Am Tue, 2 Oct 2018 06:55:02 +0200 schrieb luigi scarso:

...

...
For what do you reserve the space in the PUA?

...

http://www.pragma-ade.nl/general/manuals/fonts-mkiv.pdf page 32 of the document :

...

As we already mentioned in a previous chapter, in ConTeXt we use Unicode internally. This also means that fonts are organized this way. By default the glyph representation of a Unicode character sits in the same slot in the glyph table. All additional glyphs, like ligatures or alternates are pushed in the private unicode space. This is why in the lists shown in the figures the ligatures have a private Unicode number.

Hm. To clarify. In xetex there is clear distinction between the slot and unicode. \XeTeXglyph (slot) and \char (unicode) give different output and \char actively uses the tounicode mapping of the font. \font\test="[lmroman10-regular.otf]" \test \XeTeXglyph"7A \char"7A \bye In luatex \char and \Uchar don't really care about unicode, even if the font has tounicode=1 and tounicode entries, they access the char by the hashed integer number. So to get "unicode" the font loader has to sort the glyphs, index unicode glyphs by their unicode code point, and assign "non-unicode" glyphs numbers that don't interfere. Did I got right? Then I do understand that you need some free numbers to push glyphes. But I do not understand why to achieve this you remove glyphs from their unicode points. The PUA is not some non-unicode wilderness. The code points there are as valid as in the other code blocks. You wouldn't move away the greek block to get the place, so why do you think it is okay to throw out of the PUA block what SIL and other font designers encoded there? Can't you check for a free range instead? -- Ulrike Fischer http://www.troubleshooting-tex.de/

Hans Hagen

3:29 a.m.

On 10/2/2018 9:29 AM, Ulrike Fischer wrote:

...

Am Tue, 2 Oct 2018 06:55:02 +0200 schrieb luigi scarso:

...
...
For what do you reserve the space in the PUA?

...
http://www.pragma-ade.nl/general/manuals/fonts-mkiv.pdf page 32 of the document :

...
As we already mentioned in a previous chapter, in ConTeXt we use Unicode internally. This also means that fonts are organized this way. By default the glyph representation of a Unicode character sits in the same slot in the glyph table. All additional glyphs, like ligatures or alternates are pushed in the private unicode space. This is why in the lists shown in the figures the ligatures have a private Unicode number.

Hm. To clarify. In xetex there is clear distinction between the slot and unicode. \XeTeXglyph (slot) and \char (unicode) give different output and \char actively uses the tounicode mapping of the font.

\font\test="[lmroman10-regular.otf]" \test \XeTeXglyph"7A \char"7A \bye

In luatex \char and \Uchar don't really care about unicode, even if the font has tounicode=1 and tounicode entries, they access the char by the hashed integer number.

they access the char in the characters table (where each character has an index field so one can write a simple function that accesses it by index; also, i assume that in xetex \char gives the character as known to tex so if one input non-unicode one gets that)

...

So to get "unicode" the font loader has to sort the glyphs, index unicode glyphs by their unicode code point, and assign "non-unicode" glyphs numbers that don't interfere.

Did I got right?

indeed, and we use the private space for those with no unicode (which can be a lot, also think for instance of the snippets that make up math extensibles)

...

Then I do understand that you need some free numbers to push glyphes. But I do not understand why to achieve this you remove glyphs from their unicode points. The PUA is not some non-unicode wilderness. The code points there are as valid as in the other code blocks. You wouldn't move away the greek block to get the place, so why do you think it is okay to throw out of the PUA block what SIL and other font designers encoded there? Can't you check for a free range instead?

sure, but then i also loose some functionality in context (unless i gho for ugly solutions) ... as all glyphs are supposed to have a name access by name is a pretty good alternative the main issue is that there are fonts that use private > 0xFFFF space which then would mean a lot of extra mem for names ... so the question is are there fonts that use that range Hans -- ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------

Ulrike Fischer

5:39 a.m.

Am Tue, 2 Oct 2018 11:29:46 +0200 schrieb Hans Hagen:

...

...
Can't you check for a free range instead?

...

sure, but then i also loose some functionality in context (unless i gho for ugly solutions) ... as all glyphs are supposed to have a name access by name is a pretty good alternative

Well in my view name and code point are both valid and useful accesses (and I wouldn't trust names too much). Beside this: xetex has (for non-legacy fonts) primitives for all accesses: by char (unicode), slot and name. luatex hasn't, here the only (primitive) access are commands like \char which expect a number; the name field of a character is marked as "unused" in the manual. Neither has the generic fontloader imho some suitable primitive command for name access. All the examples in the generic folder uses numbers or direct input: e.g. \Uchar"1D49D or \Uradical "0 "221A So it is imho quite natural that people who write code and packages expect the access by \char + code point to work. Why should I bother with a (perhaps font specific) glyph name if I can simply look up a clear code point number in a table? And if I got it right you are reserving a specific space to have stable numbers internally, so you are caring about numbers too ;-)

...

the main issue is that there are fonts that use private > 0xFFFF space

I don't know. Wikipedia says that code2000 uses plane 15 but I didn't check. -- Ulrike Fischer http://www.troubleshooting-tex.de/

Hans Hagen

6:42 a.m.

On 10/2/2018 1:39 PM, Ulrike Fischer wrote:

...

Am Tue, 2 Oct 2018 11:29:46 +0200 schrieb Hans Hagen:

...
...
Can't you check for a free range instead?

...
sure, but then i also loose some functionality in context (unless i gho for ugly solutions) ... as all glyphs are supposed to have a name access by name is a pretty good alternative

Well in my view name and code point are both valid and useful accesses (and I wouldn't trust names too much).

Beside this: xetex has (for non-legacy fonts) primitives for all accesses: by char (unicode), slot and name.

whatever ...

...

luatex hasn't, here the only (primitive) access are commands like \char which expect a number; the name field of a character is marked as "unused" in the manual.

sure, as one can write lua code to provide that feature .. there is no benefit in having that code in the engine (in fact, even more could go)

...

Neither has the generic fontloader imho some suitable primitive command for name access. All the examples in the generic folder uses numbers or direct input: e.g. \Uchar"1D49D or \Uradical "0 "221A

one can write these helpers ... i consider those things macro package dependent asd there's often some higher leel interface

...

So it is imho quite natural that people who write code and packages expect the access by \char + code point to work. Why should I bother with a (perhaps font specific) glyph name if I can simply look up a clear code point number in a table?

ok, so it depends on the users and viewpoints of macro package writers .. if some extra glyph cannot be given a meaningful name it's probably not worth using anyway

...

And if I got it right you are reserving a specific space to have stable numbers internally, so you are caring about numbers too ;-)

symbolic mapping and for text not hard coded (and shared therefore efficient) btu i shifted that space up and hope for the best (for context users that is, as i cannot test a lot now)

...

...
the main issue is that there are fonts that use private > 0xFFFF space

I don't know. Wikipedia says that code2000 uses plane 15 but I didn't check.

anyway ... i adapted the code to keep the pua intact and also added an option for outside context to keep bogus names ... (context users have several ways to access shapes anyway) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------

2475

Age (days ago)

2477

Last active (days ago)

List overview

Download

11 comments

3 participants

participants (3)

Hans Hagen
luigi scarso
Ulrike Fischer