[NTG-context] sort-lan.lua nitpicks and sorting

Philipp Gesang pgesang at ix.urz.uni-heidelberg.de
Sun May 2 15:59:53 CEST 2010

Hi again,

1. In sort-lan.lua, line 101 should read «['r'] = "r"», and line 144
«['r'] = 26, -- r».

2. Although I read the disclaimer about said file being “preliminary and
incomplete” -- is there some rationale behind the range of integers for
each language mapping? The mapping for English goes from 1 to 51,
interleaving 2 integers for each letter (which is odd because it should
start from index 3 with “a”, shouldn't it?), while the Czech one goes
from 1 to 40 without skipping, Finnish and Austrian from 1 to 58. 

  What about mapping them onto a larger but common scale that would
alleviate multilingual sorting so that the alphabetical representation
of the phoneme /a/ maps to the same value over different languages?†
  ["a"] = 3, -- in a Latin mapping,
  ["α"] = 3, -- in Greek mapping,
  ["а"] = 3, -- in a Russian mapping.

3. Is it intended that the digraph “ch” resolves (temporarily) to
http://www.fileformat.info/info/unicode/char/ff01/index.htm according to
line 72?

Feel free to state more general opinions on the sorting topic as I am
playing with different ways of sorting my bibliography. I will be glad
about any advice,


†   I know this is impractical for many writing systems and even within
the set of Latin or Greek based alphabets it largely depends on a given
purpose how much precision you need in sorting.

