tex-text (XeTeX) & trep, tlig (LuaTeX)

Mojca Miklavec

5 Dec 2007 5 Dec '07

11:36 a.m.

Hello Hans, I compared tex-text and trep, tlig. Since you map features=default to trep and tlig, and both of them further to tex-text (twice), tex-text could be split in two parts as well, so that you can make one-to-one mapping. Should I make those changes and seny you the two files? (Twice 300 bytes to be added to ConTeXt + souce, which is approximately the same as written below.) 1.) TREP function fonts.initializers.base.otf.texquotes(tfm,value) tfm.characters[0x0022] = table.fastcopy(tfm.characters[0x201D]) tfm.characters[0x0027] = table.fastcopy(tfm.characters[0x2019]) tfm.characters[0x0060] = table.fastcopy(tfm.characters[0x2018]) end Corresponding lines from XeTeX: U+0022; ; ; >;U+201D; ; " -> right double quote U+0027; ; ; <>;U+2019; ; ' -> right single quote U+0060; ; ; <>;U+2018; ; ` -> left single quote 2.) TLIG { "endash", "hyphen hyphen" }, U+002D U+002D; ; ; <>;U+2013; ; -- -> en dash { "emdash", "hyphen hyphen hyphen" }, U+002D U+002D U+002D;<>;U+2014; ; --- -> em dash { "quotedblright", "quotesingle quotesingle" }, U+0027 U+0027; <>;U+201D; ; '' -> right double quote { "quotedblleft", "grave grave" }, U+0060 U+0060; <>;U+201C; ; `` -> left double quote { "quotedblbase", "comma comma" } U+002C U+002C; <>;U+201E; ; ,, -> DOUBLE LOW-9 QUOTATION MARK missing from tex-text (not needed so far) { "quotedblleft", "quoteleft quoteleft" }, 0x2018 0x2018 <> 0x201C ; 2x left single quote -> left double quote { "quotedblright", "quoteright quoteright" }, 0x2019 0x2019 <> 0x201D ; 2x right single quote -> right double quote 3.) Only in XeTeX's tex-text (Do people need it?): U+0021 U+0060; <>;U+00A1; ; !` -> inverted exclam U+003F U+0060; <>;U+00BF; ; ?` -> inverted question U+003C U+003C; <>;U+00AB; ; << -> LEFT POINTING GUILLEMET U+003E U+003E; <>;U+00BB; ; >> -> RIGHT POINTING GUILLEMET Plus, there's a little problem with this patch that I have sent you (it's not in stable yet, so it might make sense to fix it before releasing): \definefontsynonym[Dummy] [name:\typescripttwo] [features=default] \definefontsynonym[DummyItalic] [name:\typescripttwo/I] [features=default] \definefontsynonym[DummyBold] [name:\typescripttwo/B] [features=default] \definefontsynonym[DummyBoldItalic][name:\typescripttwo/BI][features=default] \definefontsynonym[DummyCaps] [name:\typescripttwo] [features=smallcaps] The problem is that "features=default" implies "script=latn", which is not always desired. A copy of mapping=tex-text comes from tlig & trep substitution. I assume that script=latn;language=dflt;+liga;+kern; is always on by default (were needed), so basically mapping=tex-text is the only thing that really needs to be added. [Iwona-Bold.otf]:script=latn;language=dflt;+liga;+kern;mapping=tex-text;mapping=tex-text; Some (non-latin) fonts complain when one requests non-existing features. Also, it might be handy to be able to define \definetypeface[basic][rm][Xserif][whatever][script=arab,language=...] (for now forget that one, interface needs to be extended once and properly). \definefontsynonym[a][file:Iwona-Bold.otf][mapping=tex-text] doesn't work, so the only way seems to be \definefontfeature[xetex][mapping=tex-text] \definefontsynonym[a][file:Iwona-Bold.otf][xetex] I have tried to use \definefontfeature[xetex][mapping=tex-text] \definefontfeature[caps][+smcp] \definefontsynonym[a][file:Iwona-Bold.otf][features={xetex,caps}] but that didn't work. Mojca

Show replies by date

Hans Hagen

5 Dec 5 Dec

12:18 p.m.

Mojca Miklavec wrote:

...

I compared tex-text and trep, tlig. Since you map features=default to trep and tlig, and both of them further to tex-text (twice), tex-text could be split in two parts as well, so that you can make one-to-one mapping.

there is a difference here vetween xetex and luatex

...

Should I make those changes and seny you the two files? (Twice 300 bytes to be added to ConTeXt + souce, which is approximately the same as written below.)

first i need to understand the problem; actually i'm even thinking of not defaulting (in luatex) the mapping other than -- and --- because they are (1) not sensible and (2) users shoul duse quotation commands and/or (3) use the proper utf codes

...

1.) TREP

function fonts.initializers.base.otf.texquotes(tfm,value) tfm.characters[0x0022] = table.fastcopy(tfm.characters[0x201D]) tfm.characters[0x0027] = table.fastcopy(tfm.characters[0x2019]) tfm.characters[0x0060] = table.fastcopy(tfm.characters[0x2018]) end

Corresponding lines from XeTeX:

U+0022; ; ; >;U+201D; ; " -> right double quote U+0027; ; ; <>;U+2019; ; ' -> right single quote U+0060; ; ; <>;U+2018; ; ` -> left single quote

these i hate most, and personally never use them ... if i key in a char explicitly i want that char and not another

...

2.) TLIG

{ "endash", "hyphen hyphen" }, U+002D U+002D; ; ; <>;U+2013; ; -- -> en dash

{ "emdash", "hyphen hyphen hyphen" }, U+002D U+002D U+002D;<>;U+2014; ; --- -> em dash

{ "quotedblright", "quotesingle quotesingle" }, U+0027 U+0027; <>;U+201D; ; '' -> right double quote

{ "quotedblleft", "grave grave" }, U+0060 U+0060; <>;U+201C; ; `` -> left double quote

{ "quotedblbase", "comma comma" } U+002C U+002C; <>;U+201E; ; ,, -> DOUBLE LOW-9 QUOTATION MARK

missing from tex-text (not needed so far)

actually there's even space + something becomes something else

...

{ "quotedblleft", "quoteleft quoteleft" }, 0x2018 0x2018 <> 0x201C ; 2x left single quote -> left double quote

{ "quotedblright", "quoteright quoteright" }, 0x2019 0x2019 <> 0x201D ; 2x right single quote -> right double quote

and then those spanish ...

...

3.) Only in XeTeX's tex-text (Do people need it?):

U+0021 U+0060; <>;U+00A1; ; !` -> inverted exclam U+003F U+0060; <>;U+00BF; ; ?` -> inverted question

U+003C U+003C; <>;U+00AB; ; << -> LEFT POINTING GUILLEMET U+003E U+003E; <>;U+00BB; ; >> -> RIGHT POINTING GUILLEMET

let's get rid of it

...

Plus, there's a little problem with this patch that I have sent you (it's not in stable yet, so it might make sense to fix it before releasing):

\definefontsynonym[Dummy] [name:\typescripttwo] [features=default] \definefontsynonym[DummyItalic] [name:\typescripttwo/I] [features=default] \definefontsynonym[DummyBold] [name:\typescripttwo/B] [features=default] \definefontsynonym[DummyBoldItalic][name:\typescripttwo/BI][features=default]

\definefontsynonym[DummyCaps] [name:\typescripttwo] [features=smallcaps]

i'd like to let caps and such go away completely for mkiv so maybe we end up with xetex defs versus luatex defs; i wonder if in practice users will use both at the same time (ok, you do)

...

The problem is that "features=default" implies "script=latn", which is not always desired. A copy of mapping=tex-text comes from tlig & trep substitution.

we can fall back to dflt which in practice boils down to latn

...

I assume that script=latn;language=dflt;+liga;+kern; is always on by default (were needed), so basically mapping=tex-text is the only thing that really needs to be added.

well, i'd prefer ... only -- and --- and make anything else up to the user, which means, redefining default in cont-sys if needed

...

[Iwona-Bold.otf]:script=latn;language=dflt;+liga;+kern;mapping=tex-text;mapping=tex-text;

two mappings?

...

Some (non-latin) fonts complain when one requests non-existing features.

in xetex you mean?

...

Also, it might be handy to be able to define \definetypeface[basic][rm][Xserif][whatever][script=arab,language=...] (for now forget that one, interface needs to be extended once and properly).

\definefontsynonym[a][file:Iwona-Bold.otf][mapping=tex-text] doesn't work, so the only way seems to be \definefontfeature[xetex][mapping=tex-text] \definefontsynonym[a][file:Iwona-Bold.otf][xetex]

I have tried to use \definefontfeature[xetex][mapping=tex-text] \definefontfeature[caps][+smcp] \definefontsynonym[a][file:Iwona-Bold.otf][features={xetex,caps}] but that didn't work.

indeed, handling comma separated lists is too slow there .. ok, we can do it for xetex and in luatex use lua for it ... or i could hash the commalist itself ... needs a bit of thinking but eventually we need to be able to combine features (this even more points into a separate definition file for xetex) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Mojca Miklavec

6 Dec 6 Dec

3:09 p.m.

On 12/5/07, Hans Hagen wrote:

...

Mojca Miklavec wrote:

...
I compared tex-text and trep, tlig. Since you map features=default to trep and tlig, and both of them further to tex-text (twice), tex-text could be split in two parts as well, so that you can make one-to-one mapping.

there is a difference here between xetex and luatex

I know, that's why I'm volunteering to make XeTeX behave equal to LuaTeX (and asking for some opinions about it).

...

first i need to understand the problem;

In LuaTeX you define two "features": - trep (3 replacements) - tlig (8 ligatures) In XeTeX, these 11 and 4 others are combined in "mapping=tex-text". And I would like the ligatures to be the same on both engines, so I would like to clone the behavior from LuaTeX in XeTeX (once you decide what the default should be - from your reply I understand that you plan to change the behaviour) and split "mapping=tex-text" into more separate "mappings".

...

actually i'm even thinking of not defaulting (in luatex) the mapping other than -- and --- because they are (1) not sensible and (2) users should use quotation commands and/or (3) use the proper utf codes

That's fine by me, I would agree. Only: what about apostrophe (')? (I'm, you're, ..)

...

...
1.) TREP

function fonts.initializers.base.otf.texquotes(tfm,value) tfm.characters[0x0022] = table.fastcopy(tfm.characters[0x201D]) tfm.characters[0x0027] = table.fastcopy(tfm.characters[0x2019]) tfm.characters[0x0060] = table.fastcopy(tfm.characters[0x2018]) end

Corresponding lines from XeTeX:

U+0022; ; ; >;U+201D; ; " -> right double quote U+0027; ; ; <>;U+2019; ; ' -> right single quote U+0060; ; ; <>;U+2018; ; ` -> left single quote

these i hate most, and personally never use them ... if i key in a char explicitly i want that char and not another

I'm not sure about it, but how is it with the apostrophe in I'm, don't, ... ? I don't care too much about " -> right double quote and ` -> left single quote esp. the first one should better be left that way, and I never use ` (and have no idea what it's used for except in mysql and TeX source code)

...

...
2.) TLIG

{ "endash", "hyphen hyphen" }, U+002D U+002D; ; ; <>;U+2013; ; -- -> en dash

{ "emdash", "hyphen hyphen hyphen" }, U+002D U+002D U+002D;<>;U+2014; ; --- -> em dash

{ "quotedblright", "quotesingle quotesingle" }, U+0027 U+0027; <>;U+201D; ; '' -> right double quote

{ "quotedblleft", "grave grave" }, U+0060 U+0060; <>;U+201C; ; `` -> left double quote

{ "quotedblbase", "comma comma" } U+002C U+002C; <>;U+201E; ; ,, -> DOUBLE LOW-9 QUOTATION MARK

missing from tex-text (not needed so far)

actually there's even space + something becomes something else

I don't mind those, TeX never uses space anyway.

...

...
{ "quotedblleft", "quoteleft quoteleft" }, 0x2018 0x2018 <> 0x201C ; 2x left single quote -> left double quote

{ "quotedblright", "quoteright quoteright" }, 0x2019 0x2019 <> 0x201D ; 2x right single quote -> right double quote

and then those spanish ...

...
3.) Only in XeTeX's tex-text (Do people need it?):

U+0021 U+0060; <>;U+00A1; ; !` -> inverted exclam U+003F U+0060; <>;U+00BF; ; ?` -> inverted question

U+003C U+003C; <>;U+00AB; ; << -> LEFT POINTING GUILLEMET U+003E U+003E; <>;U+00BB; ; >> -> RIGHT POINTING GUILLEMET

let's get rid of it

Fine by me.

...

i'd like to let caps and such go away completely for mkiv so maybe we end up with xetex defs versus luatex defs;

So ... perhaps Caps support needs to be rewritten in XeTeX as well :) I still don't understand how to get "bold italic sans caps" for example (just as I don't know how to get bold/bold italic math). Do it your way in LuaTeX ... support for XeTeX can follow.

...

i wonder if in practice users will use both at the same time (ok, you do)

I use LuaTeX because: - sometimes Lua is really handy and you have a marvellous database for Unicode well integrated :), nice to inspect fonts etc. I use XeTeX because: - LuaTeX does't (or at least didn't) always work as it should. XeTeX "saved my life" in May because of some dirty LuaTeX bugs stopped the show in the middle (I already had a working copy - final PDF, and after minor modifications it stopped working) - others keep asking questions (and since I suggested or approved quite some bugs in XeTeX recently, I feel responsible for helping the victims :) - to show off with Zapfino - I don't have it as OpenType :-) In any case: what I really love about ConTeXt is that [more-or-less] no change is needed to compile the same (simple) document with either engine (and compare the result or to switch quickly if support in one engine is buggy). When using lua that is no longer true, but still, same definitions and similar results with both engines would be nice. (Esp. if both engines will be merged in the future :)

...

...
The problem is that "features=default" implies "script=latn", which is not always desired. A copy of mapping=tex-text comes from tlig & trep substitution.

we can fall back to dflt which in practice boils down to latn

That's probably better.

...

...
I assume that script=latn;language=dflt;+liga;+kern; is always on by default (were needed), so basically mapping=tex-text is the only thing that really needs to be added.

well, i'd prefer ... only -- and --- and make anything else up to the user, which means, redefining default in cont-sys if needed

...
[Iwona-Bold.otf]:script=latn;language=dflt;+liga;+kern;mapping=tex-text;mapping=tex-text;

two mappings?

I mean: the the font is called with mapping=tex-text;mapping=tex-text; whis doesn't make so much sense. I can create two new mappings, so that "tlig=yes" would then be "mapping=tlig" and "trep=yes" would be "mapping=trep" instead of "mapping=tex-text". We only need to agree which features are where (and you need to remove ?` from beginner's manual). Spanish users probably have it on keyboard and "others" either don't need it or will find it somehow. << and >> are not needed either in my opinion.

...

...
Some (non-latin) fonts complain when one requests non-existing features.

in xetex you mean?

Yes. Well, LuaTeX probably "complains" as well (as in report >> load otf: warning: Warning: Glyph 1423 is named fi which should mean it is mapped to Unicode U+FB01, but Glyph 207 already has that encoding. etc.) - complaining is a good thing in general, it only means "please do not use latin script for non-latin fonts". And "features=default" should prbably not force latin. (Then it should at least be "features=default-latin" or somothing similar.)

...

...
Also, it might be handy to be able to define \definetypeface[basic][rm][Xserif][whatever][script=arab,language=...] (for now forget that one, interface needs to be extended once and properly).

\definefontsynonym[a][file:Iwona-Bold.otf][mapping=tex-text] doesn't work, so the only way seems to be \definefontfeature[xetex][mapping=tex-text] \definefontsynonym[a][file:Iwona-Bold.otf][xetex]

I have tried to use \definefontfeature[xetex][mapping=tex-text] \definefontfeature[caps][+smcp] \definefontsynonym[a][file:Iwona-Bold.otf][features={xetex,caps}] but that didn't work.

indeed, handling comma separated lists is too slow there .. ok, we can do it for xetex and in luatex use lua for it ... or i could hash the commalist itself ... needs a bit of thinking but eventually we need to be able to combine features (this even more points into a separate definition file for xetex)

That's up to you. I don't know anything about internals here. Mojca

Hans Hagen

7 Dec 7 Dec

8:01 p.m.

Mojca Miklavec wrote:

...

In LuaTeX you define two "features": - trep (3 replacements) - tlig (8 ligatures) In XeTeX, these 11 and 4 others are combined in "mapping=tex-text". And I would like the ligatures to be the same on both engines, so I would like to clone the behavior from LuaTeX in XeTeX (once you decide what the default should be - from your reply I understand that you plan to change the behaviour) and split "mapping=tex-text" into more separate "mappings".

well, let'd assume that hardly anyone uses tlig, so then the 8 ligs (well, not really ligs) can be kept, saves us some some headache; and we just don't make it a default -)

...

That's fine by me, I would agree. Only: what about apostrophe (')? (I'm, you're, ..)

shouldn't that one be left untouched?

...

I'm not sure about it, but how is it with the apostrophe in I'm, don't, ... ?

I don't wanna know -)

...

I don't care too much about " -> right double quote and ` -> left single quote esp. the first one should better be left that way, and I never use ` (and have no idea what it's used for except in mysql and TeX source code)

how about a vote on the list ...

...

...
...
U+0021 U+0060; <>;U+00A1; ; !` -> inverted exclam U+003F U+0060; <>;U+00BF; ; ?` -> inverted question

U+003C U+003C; <>;U+00AB; ; << -> LEFT POINTING GUILLEMET U+003E U+003E; <>;U+00BB; ; >> -> RIGHT POINTING GUILLEMET let's get rid of it

Fine by me.

real weird things ... the french should use proper utf chars but well ...

...

...
i'd like to let caps and such go away completely for mkiv so maybe we end up with xetex defs versus luatex defs;

So ... perhaps Caps support needs to be rewritten in XeTeX as well :) I still don't understand how to get "bold italic sans caps" for example (just as I don't know how to get bold/bold italic math).

i think that caps (in xetex) can best be done by just defining an extra typeface and then switch typeface (fast too)

...

I use LuaTeX because: - sometimes Lua is really handy and you have a marvellous database for Unicode well integrated :), nice to inspect fonts etc. I use XeTeX because: - LuaTeX does't (or at least didn't) always work as it should. XeTeX "saved my life" in May because of some dirty LuaTeX bugs stopped the show in the middle (I already had a working copy - final PDF, and after minor modifications it stopped working)

that is because mkiv parses the text and if it becomes too weird, abusive, nonsence, offending it will enter a special mode

...

- others keep asking questions (and since I suggested or approved quite some bugs in XeTeX recently, I feel responsible for helping the victims :)

sure, also, for some apps xetex is more convenient (faster, and many docs re not that demanding so default feature handling can do the job well)

...

- to show off with Zapfino - I don't have it as OpenType :-)

hm, you mean that you're using the apple font model ... now who was talking about portability ...

...

In any case: what I really love about ConTeXt is that [more-or-less] no change is needed to compile the same (simple) document with either engine (and compare the result or to switch quickly if support in one engine is buggy).

so we should keep that property

...

When using lua that is no longer true, but still, same definitions and similar results with both engines would be nice. (Esp. if both engines will be merged in the future :)

by then i'm retired and you'll have to do the coding ... both engines fill in a niche and do that well so no problem to support both; eventually mkii will be the xetex thing and mkiv the luatex thing

...

...
...
The problem is that "features=default" implies "script=latn", which is not always desired. A copy of mapping=tex-text comes from tlig & trep substitution. we can fall back to dflt which in practice boils down to latn

That's probably better.

i removed the script/langs already

...

...
...
I assume that script=latn;language=dflt;+liga;+kern; is always on by default (were needed), so basically mapping=tex-text is the only thing that really needs to be added. well, i'd prefer ... only -- and --- and make anything else up to the user, which means, redefining default in cont-sys if needed

...
[Iwona-Bold.otf]:script=latn;language=dflt;+liga;+kern;mapping=tex-text;mapping=tex-text; two mappings?

I mean: the the font is called with mapping=tex-text;mapping=tex-text; whis doesn't make so much sense.

no, but let's not waste too much tex processing on it, doesn't hurt i guess

...

I can create two new mappings, so that "tlig=yes" would then be "mapping=tlig" and "trep=yes" would be "mapping=trep" instead of "mapping=tex-text".

yes, can be part of the context zip

...

We only need to agree which features are where (and you need to remove ?` from beginner's manual). Spanish users probably have it on keyboard and "others" either don't need it or will find it somehow.

ah .. manual rewriting ...

...

<< and >> are not needed either in my opinion.

...
...
Some (non-latin) fonts complain when one requests non-existing features. in xetex you mean?

Yes. Well, LuaTeX probably "complains" as well (as in report >> load otf: warning: Warning: Glyph 1423 is named fi which should mean it is mapped to Unicode U+FB01, but Glyph 207 already has that encoding. etc.) - complaining is a good thing in general, it only means "please do not use latin script for non-latin fonts". And "features=default" should prbably not force latin. (Then it should at least be "features=default-latin" or somothing similar.)

luatex does not care because it does nothing with features, it loads the file ... and, mkiv nicely checks for the features supported, so if you ask for mkmk in latin modern it will just ignore it (part of setting up the processing sequences and caches), mkiv does not even bother you with a message

...

...
...
Also, it might be handy to be able to define \definetypeface[basic][rm][Xserif][whatever][script=arab,language=...] (for now forget that one, interface needs to be extended once and properly).

\definefontsynonym[a][file:Iwona-Bold.otf][mapping=tex-text] doesn't work, so the only way seems to be \definefontfeature[xetex][mapping=tex-text] \definefontsynonym[a][file:Iwona-Bold.otf][xetex]

I have tried to use \definefontfeature[xetex][mapping=tex-text] \definefontfeature[caps][+smcp] \definefontsynonym[a][file:Iwona-Bold.otf][features={xetex,caps}] but that didn't work. indeed, handling comma separated lists is too slow there .. ok, we can do it for xetex and in luatex use lua for it ... or i could hash the commalist itself ... needs a bit of thinking but eventually we need to be able to combine features (this even more points into a separate definition file for xetex)

That's up to you. I don't know anything about internals here.

you can't fool me .. ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Taco Hoekwater

10:16 p.m.

Hans Hagen wrote:

...

...
That's fine by me, I would agree. Only: what about apostrophe (')? (I'm, you're, ..)

shouldn't that one be left untouched?

On typewriters (the real ones) there was often a character that slanted upward, and it was used for end-of-quote as well as for the "acute" accent. The character in ASCII looks like a vertical stripe, but it definately supposed to represent an apostrophe there, along with other meanings (on my physical keyboard, it looks like an acute accent). Unicode actually prefers 0x2019, "RIGHT SINGLE QUOTATION MARK" (which is semantically quite different from "apostrophe", imho), but it does name the character at 0x27 "APOSTROPHE". It states that character to be a "neutral (vertical) glyph with mixed usage", leaving it wide open to interpretation. I personally would really hate having to use a keyboard mapper for phrases like "I'm sure" and "Hans' macros". Please keep the ' to ’ remapping, at least in roman proportional fonts.

...

...
esp. the first one should better be left that way, and I never use ` (and have no idea what it's used for except in mysql and TeX source code)

It is used in _lots_ of programming languages, but other then that I don't know. The Unicode name of 0x60 is "GRAVE ACCENT". I assume it is inherited from the typewriter era (circumflex as well). Best wishes, Taco

Hans Hagen

10:27 p.m.

Taco Hoekwater wrote:

...

I personally would really hate having to use a keyboard mapper for phrases like "I'm sure" and "Hans' macros". Please keep the ' to ’ remapping, at least in roman proportional fonts.

in that case i wonder if we could best make it a static feature i.e. i just remap the character (for taco: only an initializer in mkiv node processing) which saves a node pass Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

6416

Age (days ago)

6418

Last active (days ago)

List overview

Download

5 comments

3 participants

participants (3)

Hans Hagen
Mojca Miklavec
Taco Hoekwater