Thank you for the prompt and thorough response!

If the reorderings have to be done for each pair of characters in different combining classes that are not in the expected typographical order, then there will be a lot (probably hundreds) of substitution rules. I am not very familiar with coding in Lua, but if there is a way to add substitution features for specific classes of points, then that would require a lot fewer cases.

Unicode's canonical ordering of Hebrew marks is based on their combining classes, with characters in higher combining classes being sorted after those with lower combining classes in canonical order. The typographically recommended ordering of certain characters is found in Table 1 (p. 12) of https://www.sbl-site.org/Fonts/SBLHebrewUserManual1.5x.pdf. The following list of character classes, with information about their Unicode combining classes (which I retrieved from the Lua script https://raw.githubusercontent.com/michal-h21/uninormalize/master/char-def-with-ccc.lua), is indexed after the character classes described in that table:

1. The consonants (Unicode points 05D0-05EA) have no combining class and are never reordered; this is typographically correct.

2. Shin dot and sin dot (05C1-05C2) should be next, but Unicode places them in combining classes 24 and 25, after the characters in recommended classes 3-5 and many of the characters in recommended class 6.

3. Dagesh / mapiq (05BC) should be next, but Unicode assigns it a combining class of 21. This means that it will be incorrectly ordered before characters in recommended class 2 and after characters in recommended classes 4-6 after Unicode normalization.

4. Rafe (05BF) should be next, but Unicode assigns it a combining class of 23. Thus, it will be correctly placed after characters in recommended class 3, but incorrectly placed before characters in recommended class 2 after Unicode normalization.

5. The holam and holam haser vowel points (05B9-05BA) should be next, but Unicode places them in combining class 19. This means that it will be placed incorrectly before characters in recommended classes 2-4 and after all characters in recommended class 6 except 05BB after Unicode normalization.

6. The characters in 0591, 0596, 059B, 05A2-05A7, 05AA, 05B0-05B8, 05BB, 05BD, 05C5, 05C7 should be treated as being in the same class, but Unicode places them in combining classes 10-18, 20, 22, and 220.

7. The prepositive marks yetiv and dehi (059A, 05AD) should be next; Unicode places them in combining class 222, so they should correctly come after all characters in recommended classes 1-6.

8. The characters 0307, 0593-0595, 0597-0598, 059C-05A1, 05A8, 05AB-05AC, 05AF, 05C4 should be treated as being in the same class; Unicode places them in combining class 230, so they should correctly come after all characters in recommended classes 1-7.

9. The postpositive marks segolta, pashta, telisha qetana, and zinor (0592, 0599, 05A9, 05AE) should be next; Unicode places them in combining class 230, so they will need to be reordered after the characters in recommended class 8.

This a lot of information, and I've probably not presented it as clearly as I could, so if there is any confusion, please let me know, and I can try to explain better. If there is any other information you need, please let me know.

Thanks again!

On Tue, Apr 28, 2020 at 9:17 AM Hans Hagen <j.hagen@xs4all.nl> wrote:

On 4/28/2020 1:59 PM, Joey McCollum wrote:
> \definefontfeature[f:pointedhebrew][default][
> ccmp=yes,
> mark=yes,
> script=hebr
> ]
> \definefontfamily[hebrew] [rm] [SBL Hebrew] [features=f:pointedhebrew]
> %Set the body font:
> \setupbodyfont[hebrew]
> %Set up right-to-left alignment:
> \setupalign[r2l]
> \starttext
> %Characters after normalization, in Unicode canonical order (bet +
> segol + dagesh + final nun):
> בֶּן
>
> %A word with characters in typographically recommended order (bet +
> dagesh + segol + final nun):
> בֶּן
> \stoptext

\startluacode
fonts.handlers.otf.addfeature {
name = "normalizehebrew",
type = "chainsubstitution",
prepend = 1,
lookups = {
{
type = "multiple",
data = {
[0x5B6] = { 0x5BC, 0x5B6 },
},
},
},
data = {
rules = {
{
current = { { 0x5B6 }, { 0x5BC } },
lookups = { 1, 0 },
},
},
},
}
\stopluacode

\definefontfeature
[f:pointedhebrew]
[hebrew]
[normalizehebrew=yes]

\definefontfamily[hebrew] [rm] [SBL Hebrew] [features=f:pointedhebrew]

\setupbodyfont[hebrew]

\setupalign[r2l]

\starttext
בֶּן \quad בֶּן \par
\stoptext

How many such reorderings are there? (I saw some document about that
font and it sounds like a bit messy wrt all these input variants.)

(there are several mechanisms in context to deal with such issues, it's
all about getting specs from users i.e. tex is all about control so in
principle it should be doable)

Hans

-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------