On 2010-10-03 <17:43:21>, Thomas A. Schmitz wrote:
OK, I'll write something for German and English, but the thing is that we need more input what users expect. For mixtures with foreign languages, there might not be generally accepted rules at all, so people will define something on an ad-hoc basis.
Hi Thomas and others, technically speaking the problem is solved by ISO 14651.[1] In praxi multilingual sorting depends on local rules, of which “One index per script|language.” seems to be the most common. Some time ago I made an lpeg from the bnf in [1]. It matches the collation rules from [2], but as I couldn’t figure out how to map them onto context’s sorting mechanism I never got around to actually capture the information. As I won’t be having the time to try it with the new structure of sort-lan I guess I’ll just attach the peg grammar for anyone to use as a starting point. Unicode collation would be great to have in context.
transliteration. The problem with polytonic Greek is that so many different unicode characters need to have the same sort entry. If
Isn’t that just what the Greek rules in sort-lan.lua do? If not then it would be a bug. ····startsnippet················································· definitions["gr"] = { entries = { ["α"] = "α", ["ά"] = "α", ["ὰ"] = "α", ["ᾶ"] = "α", ["ᾳ"] = "α", ["ἀ"] = "α", ["ἁ"] = "α", ["ἄ"] = "α", ["ἂ"] = "α", ["ἆ"] = "α", ["ἁ"] = "α", ["ἅ"] = "α", ["ἃ"] = "α", ["ἇ"] = "α", ["ᾁ"] = "α", ["ᾴ"] = "α", ["ᾲ"] = "α", ["ᾷ"] = "α", ["ᾄ"] = "α", ["ᾂ"] = "α", ["ᾅ"] = "α", ["ᾃ"] = "α", ["ᾆ"] = "α", ["ᾇ"] = "α", ["β"] = "β", ····stopsnippet·················································· Always nice to have a decent discussion on sorting ;) Philipp [1] http://standards.iso.org/ittf/PubliclyAvailableStandards/c044872_ISO_IEC_146... [2] http://www.iso.org/ittf/ISO14651_2006_TABLE1_En.txt -- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments