On Oct 3, 2010, at 5:10 PM, Hans Hagen wrote:
mm zm pm : use mapping order, add -1,0, +1 to different case and use shape info for missing entries (similar shapes) mc zc pc : use mapping order, add -1,0, +1 to different case uc: unicode order
so, you define a sequence of comparisons where for instance
U -> order u +/- 1 \"u -> order of shape u +/- 1
etc .. a bit cryptic I admit ... some combinations give the same result depending on the vectors used. (Jano promissed to write up something.)
numbers are sorted in a special way
so, at some point we simplify characters and start looking at shapes and sort based on shapes which of course leads to clashes so in a next step we look at unicodes etc etc
OK, that makes sense. I'll play with it, but having a few choice pages on the wiki would be great!
best would be to have a test file per language with in comments the expected order; such tests should also provide foreign entries
for instance, how would you mix german and greek in your books; we probably need some specialized vectors then, which is possible as the sorting language can be configured independent from the text language
OK, I'll write something for German and English, but the thing is that we need more input what users expect. For mixtures with foreign languages, there might not be generally accepted rules at all, so people will define something on an ad-hoc basis. For Greek: I just looked at a dozen books here on my shelf. Most English books have a separate index for Greek terms; when they sort Greek terms with English words, they use transliteration. The problem with polytonic Greek is that so many different unicode characters need to have the same sort entry. If I ever see the necessity of setting this up, I'll be in touch off-list, but it's such an unusual thing that I think you shouldn't bother now. All best Thomas