[NTG-context] sort-lan.lua nitpicks and sorting

Philipp Gesang pgesang at ix.urz.uni-heidelberg.de
Sun May 2 15:59:53 CEST 2010

Hi again,

1. In sort-lan.lua, line 101 should read «['r'] = "r"», and line 144
«['r'] = 26, -- r».

2. Although I read the disclaimer about said file being “preliminary and
incomplete” -- is there some rationale behind the range of integers for
each language mapping? The mapping for English goes from 1 to 51,
interleaving 2 integers for each letter (which is odd because it should
start from index 3 with “a”, shouldn't it?), while the Czech one goes
from 1 to 40 without skipping, Finnish and Austrian from 1 to 58. 

  What about mapping them onto a larger but common scale that would
alleviate multilingual sorting so that the alphabetical representation
of the phoneme /a/ maps to the same value over different languages?†
  ["a"] = 3, -- in a Latin mapping,
  ["α"] = 3, -- in Greek mapping,
  ["а"] = 3, -- in a Russian mapping.

3. Is it intended that the digraph “ch” resolves (temporarily) to
http://www.fileformat.info/info/unicode/char/ff01/index.htm according to
line 72?

Feel free to state more general opinions on the sorting topic as I am
playing with different ways of sorting my bibliography. I will be glad
about any advice,


†   I know this is impractical for many writing systems and even within
the set of Latin or Greek based alphabets it largely depends on a given
purpose how much precision you need in sorting.

()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://www.ntg.nl/pipermail/ntg-context/attachments/20100502/ffdb09d5/attachment.pgp>

More information about the ntg-context mailing list