Re: [NTG-context] two buglets

5 Oct 2010

      On 2010-10-03 <17:43:21>, Thomas A. Schmitz wrote:
...
OK, I'll write something for German and English, but the thing
is that we need more input what users expect. For mixtures with
foreign languages, there might not be generally accepted rules at
all, so people will define something on an ad-hoc basis.
Hi Thomas and others,

technically speaking the problem is solved by ISO 14651.[1]

In praxi multilingual sorting depends on local rules, of
which “One index per script|language.” seems to be the most
common.

Some time ago I made an lpeg from the bnf in [1]. It matches the
collation rules from [2], but as I couldn’t figure out how to map
them onto context’s sorting mechanism I never got around to
actually capture the information. As I won’t be having the time
to try it with the new structure of sort-lan I guess I’ll just
attach the peg grammar for anyone to use as a starting point.
Unicode collation would be great to have in context.
...
transliteration. The problem with polytonic Greek is that so many
different unicode characters need to have the same sort entry. If
Isn’t that just what the Greek rules in sort-lan.lua do? If not
then it would be a bug.

····startsnippet·················································

definitions["gr"] = {
    entries = {
        ["α"] = "α", ["ά"] = "α", ["ὰ"] = "α", ["ᾶ"] = "α", ["ᾳ"] = "α",
        ["ἀ"] = "α", ["ἁ"] = "α", ["ἄ"] = "α", ["ἂ"] = "α", ["ἆ"] = "α",
        ["ἁ"] = "α", ["ἅ"] = "α", ["ἃ"] = "α", ["ἇ"] = "α", ["ᾁ"] = "α",
        ["ᾴ"] = "α", ["ᾲ"] = "α", ["ᾷ"] = "α", ["ᾄ"] = "α", ["ᾂ"] = "α",
        ["ᾅ"] = "α", ["ᾃ"] = "α", ["ᾆ"] = "α", ["ᾇ"] = "α", ["β"] = "β",

····stopsnippet··················································

Always nice to have a decent discussion on sorting ;)

Philipp

[1] http://standards.iso.org/ittf/PubliclyAvailableStandards/c044872_ISO_IEC_146...
[2] http://www.iso.org/ittf/ISO14651_2006_TABLE1_En.txt

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

Re: [NTG-context] two buglets

Philipp Gesang