sort-lan.lua nitpicks and sorting

2 May 2010

      Hi again,

1. In sort-lan.lua, line 101 should read «['r'] = "r"», and line 144
«['r'] = 26, -- r».

2. Although I read the disclaimer about said file being “preliminary and
incomplete” -- is there some rationale behind the range of integers for
each language mapping? The mapping for English goes from 1 to 51,
interleaving 2 integers for each letter (which is odd because it should
start from index 3 with “a”, shouldn't it?), while the Czech one goes
from 1 to 40 without skipping, Finnish and Austrian from 1 to 58. 

  What about mapping them onto a larger but common scale that would
alleviate multilingual sorting so that the alphabetical representation
of the phoneme /a/ maps to the same value over different languages?†
E.g.
  ["a"] = 3, -- in a Latin mapping,
  ["α"] = 3, -- in Greek mapping,
  ["а"] = 3, -- in a Russian mapping.

3. Is it intended that the digraph “ch” resolves (temporarily) to
http://www.fileformat.info/info/unicode/char/ff01/index.htm according to
line 72?

Feel free to state more general opinions on the sorting topic as I am
playing with different ways of sorting my bibliography. I will be glad
about any advice,

Philipp

†   I know this is impractical for many writing systems and even within
the set of Latin or Greek based alphabets it largely depends on a given
purpose how much precision you need in sorting.

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

Philipp Gesang

Philipp Gesang

Hans Hagen

tags

participants (2)