Hi all, working on a book project with index and bibliography, I discovered two small bugs (at least I think they are bugs): 1. index sorts uppercase letters after lowercase letters. Minimal example: \starttext \index{Aardvark}Aardvark \index{azygous}azygous \page \setupregister[index][n=1] \placeregister[index] \stoptext I would expect azygous to follow Aardvark, but it is sorted before. 2. (Maybe not a bug, but a somewhat unfriendly behavior): When a \cite command refers to a non-existent key and sort=bbl, ConTeXt bombs out with a lua error: ! LuaTeX error ...text/tex/texmf-context/tex/context/base/bibl-tra.lua:77: attempt to compare nil with number stack traceback: ...text/tex/texmf-context/tex/context/base/bibl-tra.lua:77: in function <...text/tex/texmf-context/tex/context/base/bibl-tra.lua:76> [C]: in function 'sort' ...text/tex/texmf-context/tex/context/base/bibl-tra.lua:84: in function 'flush' <main ctx instance>:1: in main chunk. \typesetpubslist ...hacks.flush("\@@pbsorttype ")} \doendoflist \dodoplacepublications ...sttrue \typesetpubslist \inpublistfalse \endgroup ... l.37 \placepublications[criterium=all] minimal example (the typo \cite[clarke199] instead of \cite[clarke1999a] is there on purpose to demonstrate the problem): \setuppublications[state=start, sorttype=bbl, refcommand=authornum, numbering=yes] \setuppublicationlist[samplesize={VSdK90},totalnumber=2] \startpublication[k=champion2004,t=book, a={{Champion}},y=2004, n=10,s=Cha04] \author[]{Craige~B.}[C.~B.]{}{Champion} \pubyear{2004} \title{Cultural Politics in Polybius's {\em Histories}} \city{Berkeley} \pubname{Univ. of California Pr.} \stoppublication \startpublication[k=clarke1999a,t=book, a={{Clarke}},y=1999b, n=9,s=Cla99b] \author[]{Katherine}[K.]{}{Clarke} \pubyear{1999\maybeyear{b}} \title{Between Geography and History: Hellenistic Constructions of the Roman World} \city{Oxford} \pubname{Oxford UP} \stoppublication \starttext \cite[champion2004] \cite[clarke199] \page \placepublications[criterium=all] \stoptext Could this error be handled more gracefully, i.e. intercepted? All best Thomas
On 11-2-2010 16:52, Thomas A. Schmitz wrote:
Hi all,
working on a book project with index and bibliography, I discovered two small bugs (at least I think they are bugs):
1. index sorts uppercase letters after lowercase letters. Minimal example:
\starttext
\index{Aardvark}Aardvark
\index{azygous}azygous
\page
\setupregister[index][n=1] \placeregister[index]
\stoptext
I would expect azygous to follow Aardvark, but it is sorted before.
are you sure that that's the convention for english? it's easy to change it ... \startluacode sorters.mappings['en'] = { ["a"] = 2, ["b"] = 4, ["c"] = 6, ["d"] = 8, ["e"] = 10, ["f"] = 12, ["g"] = 14, ["h"] = 16, ["i"] = 18, ["j"] = 20, ["k"] = 22, ["l"] = 24, ["m"] = 26, ["n"] = 28, ["o"] = 30, ["p"] = 32, ["q"] = 34, ["r"] = 36, ["s"] = 38, ["t"] = 40, ["u"] = 42, ["v"] = 44, ["w"] = 46, ["x"] = 48, ["y"] = 50, ["z"] = 52, ["A"] = 1, ["B"] = 3, ["C"] = 5, ["D"] = 7, ["E"] = 9, ["F"] = 11, ["G"] = 13, ["H"] = 15, ["I"] = 17, ["J"] = 19, ["K"] = 21, ["L"] = 23, ["M"] = 25, ["N"] = 27, ["O"] = 29, ["P"] = 31, ["Q"] = 33, ["R"] = 35, ["S"] = 37, ["T"] = 39, ["U"] = 41, ["V"] = 43, ["W"] = 45, ["X"] = 47, ["Y"] = 49, ["Z"] = 51, } \stopluacode \starttext \index{Aardvark}Aardvark \par \index{azygous}azygous \placeregister[index][n=1] \stoptext ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Feb 11, 2010, at 6:17 PM, Hans Hagen wrote:
are you sure that that's the convention for english? it's easy to change it ...
\startluacode sorters.mappings['en'] = { ["a"] = 2, ["b"] = 4, ["c"] = 6, ["d"] = 8, ["e"] = 10, ["f"] = 12, ["g"] = 14, ["h"] = 16, ["i"] = 18, ["j"] = 20, ["k"] = 22, ["l"] = 24, ["m"] = 26, ["n"] = 28, ["o"] = 30, ["p"] = 32, ["q"] = 34, ["r"] = 36, ["s"] = 38, ["t"] = 40, ["u"] = 42, ["v"] = 44, ["w"] = 46, ["x"] = 48, ["y"] = 50, ["z"] = 52, ["A"] = 1, ["B"] = 3, ["C"] = 5, ["D"] = 7, ["E"] = 9, ["F"] = 11, ["G"] = 13, ["H"] = 15, ["I"] = 17, ["J"] = 19, ["K"] = 21, ["L"] = 23, ["M"] = 25, ["N"] = 27, ["O"] = 29, ["P"] = 31, ["Q"] = 33, ["R"] = 35, ["S"] = 37, ["T"] = 39, ["U"] = 41, ["V"] = 43, ["W"] = 45, ["X"] = 47, ["Y"] = 49, ["Z"] = 51, } \stopluacode
\starttext \index{Aardvark}Aardvark \par \index{azygous}azygous \placeregister[index][n=1] \stoptext
No, I'm not sure at all. All I can say is that a quick check in my scholarly books didn't bring up a single example where uppercase and lowercase were treated differently. If I apply your code, I will have the same problem with Azygous -> aardvark. How would I write the table so that lowercase and uppercase are not distinguished at all? I tried \startluacode sorters.mappings['en'] = { ["a"] = 1, ["b"] = 2, ["c"] = 3, ["d"] = 4, ["e"] = 5, ["f"] = 6, ["g"] = 7, ["h"] = 8, ["i"] = 9, ["j"] = 10, ["k"] = 11, ["l"] = 12, ["m"] = 13, ["n"] = 14, ["o"] = 15, ["p"] = 16, ["q"] = 17, ["r"] = 18, ["s"] = 19, ["t"] = 20, ["u"] = 21, ["v"] = 22, ["w"] = 23, ["x"] = 24, ["y"] = 25, ["z"] = 26, } \stopluacode but that didn't work. Thomas
On 11-2-2010 18:35, Thomas A. Schmitz wrote:
On Feb 11, 2010, at 6:17 PM, Hans Hagen wrote:
are you sure that that's the convention for english? it's easy to change it ...
\startluacode sorters.mappings['en'] = { ["a"] = 2, ["b"] = 4, ["c"] = 6, ["d"] = 8, ["e"] = 10, ["f"] = 12, ["g"] = 14, ["h"] = 16, ["i"] = 18, ["j"] = 20, ["k"] = 22, ["l"] = 24, ["m"] = 26, ["n"] = 28, ["o"] = 30, ["p"] = 32, ["q"] = 34, ["r"] = 36, ["s"] = 38, ["t"] = 40, ["u"] = 42, ["v"] = 44, ["w"] = 46, ["x"] = 48, ["y"] = 50, ["z"] = 52, ["A"] = 1, ["B"] = 3, ["C"] = 5, ["D"] = 7, ["E"] = 9, ["F"] = 11, ["G"] = 13, ["H"] = 15, ["I"] = 17, ["J"] = 19, ["K"] = 21, ["L"] = 23, ["M"] = 25, ["N"] = 27, ["O"] = 29, ["P"] = 31, ["Q"] = 33, ["R"] = 35, ["S"] = 37, ["T"] = 39, ["U"] = 41, ["V"] = 43, ["W"] = 45, ["X"] = 47, ["Y"] = 49, ["Z"] = 51, } \stopluacode
\starttext \index{Aardvark}Aardvark \par \index{azygous}azygous \placeregister[index][n=1] \stoptext
No, I'm not sure at all. All I can say is that a quick check in my scholarly books didn't bring up a single example where uppercase and lowercase were treated differently. If I apply your code, I will have the same problem with Azygous -> aardvark. How would I write the table so that lowercase and uppercase are not distinguished at all? I tried
\startluacode sorters.mappings['en'] = { ["a"] = 1, ["b"] = 2, ["c"] = 3, ["d"] = 4, ["e"] = 5, ["f"] = 6, ["g"] = 7, ["h"] = 8, ["i"] = 9, ["j"] = 10, ["k"] = 11, ["l"] = 12, ["m"] = 13, ["n"] = 14, ["o"] = 15, ["p"] = 16, ["q"] = 17, ["r"] = 18, ["s"] = 19, ["t"] = 20, ["u"] = 21, ["v"] = 22, ["w"] = 23, ["x"] = 24, ["y"] = 25, ["z"] = 26, } \stopluacode
but that didn't work.
just give them the same code, so "A"=1, "a"=1 (we could make that an option: upper first, lower first, mixed) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Feb 11, 2010, at 8:29 PM, Hans Hagen wrote:
just give them the same code, so "A"=1, "a"=1
(we could make that an option: upper first, lower first, mixed)
Hans
Thank you, Hans, that works nicely! It would be good to have this as an option. And I would vote for having the "mixed" setting as default. I wasn't even aware that there were indexes that sort according to case. All best Thomas
* Hans Hagen
are you sure that that's the convention for english? it's easy to change it ...
I've never seen an ordinary English index that was sorted by case. English indexes should definitely default to case-insensitive. (Has anyone here ever been asked for an index in English sorted by case?) -- David
Hi all, Hans, On Feb 11, 2010, at 6:17 PM, Hans Hagen wrote:
1. index sorts uppercase letters after lowercase letters. Minimal example:
\starttext
\index{Aardvark}Aardvark
\index{azygous}azygous
\page
\setupregister[index][n=1] \placeregister[index]
\stoptext
I would expect azygous to follow Aardvark, but it is sorted before.
are you sure that that's the convention for english? it's easy to change it ...
\startluacode sorters.mappings['en'] = { ["a"] = 2, ["b"] = 4, ["c"] = 6, ["d"] = 8, ["e"] = 10, ["f"] = 12, ["g"] = 14, ["h"] = 16, ["i"] = 18, ["j"] = 20, ["k"] = 22, ["l"] = 24, ["m"] = 26, ["n"] = 28, ["o"] = 30, ["p"] = 32, ["q"] = 34, ["r"] = 36, ["s"] = 38, ["t"] = 40, ["u"] = 42, ["v"] = 44, ["w"] = 46, ["x"] = 48, ["y"] = 50, ["z"] = 52, ["A"] = 1, ["B"] = 3, ["C"] = 5, ["D"] = 7, ["E"] = 9, ["F"] = 11, ["G"] = 13, ["H"] = 15, ["I"] = 17, ["J"] = 19, ["K"] = 21, ["L"] = 23, ["M"] = 25, ["N"] = 27, ["O"] = 29, ["P"] = 31, ["Q"] = 33, ["R"] = 35, ["S"] = 37, ["T"] = 39, ["U"] = 41, ["V"] = 43, ["W"] = 45, ["X"] = 47, ["Y"] = 49, ["Z"] = 51, } \stopluacode
\starttext \index{Aardvark}Aardvark \par \index{azygous}azygous \placeregister[index][n=1] \stoptext
we had this pretty old thread about sorting in indexes. AFAICS, the latest beta defaults to cases-sensitive sorting. Two quick questions: 1. Is there a setup command that will make index sorting case-insensitive? The code above doesn't work anymore, so maybe you made it user-configurable now? 2. Is it really a good idea to make case-sensitive sorting the default in English? I can't remember seeing a single academic book in English that has this sort of index sorting. All best Thomas
On 3-10-2010 10:24, Thomas A. Schmitz wrote:
Hi all, Hans,
On Feb 11, 2010, at 6:17 PM, Hans Hagen wrote:
1. index sorts uppercase letters after lowercase letters. Minimal example:
\starttext
\index{Aardvark}Aardvark
\index{azygous}azygous
\page
\setupregister[index][n=1] \placeregister[index]
\stoptext
I would expect azygous to follow Aardvark, but it is sorted before.
are you sure that that's the convention for english? it's easy to change it ...
\startluacode sorters.mappings['en'] = { ["a"] = 2, ["b"] = 4, ["c"] = 6, ["d"] = 8, ["e"] = 10, ["f"] = 12, ["g"] = 14, ["h"] = 16, ["i"] = 18, ["j"] = 20, ["k"] = 22, ["l"] = 24, ["m"] = 26, ["n"] = 28, ["o"] = 30, ["p"] = 32, ["q"] = 34, ["r"] = 36, ["s"] = 38, ["t"] = 40, ["u"] = 42, ["v"] = 44, ["w"] = 46, ["x"] = 48, ["y"] = 50, ["z"] = 52, ["A"] = 1, ["B"] = 3, ["C"] = 5, ["D"] = 7, ["E"] = 9, ["F"] = 11, ["G"] = 13, ["H"] = 15, ["I"] = 17, ["J"] = 19, ["K"] = 21, ["L"] = 23, ["M"] = 25, ["N"] = 27, ["O"] = 29, ["P"] = 31, ["Q"] = 33, ["R"] = 35, ["S"] = 37, ["T"] = 39, ["U"] = 41, ["V"] = 43, ["W"] = 45, ["X"] = 47, ["Y"] = 49, ["Z"] = 51, } \stopluacode
\starttext \index{Aardvark}Aardvark \par \index{azygous}azygous \placeregister[index][n=1] \stoptext
we had this pretty old thread about sorting in indexes. AFAICS, the latest beta defaults to cases-sensitive sorting. Two quick questions:
1. Is there a setup command that will make index sorting case-insensitive? The code above doesn't work anymore, so maybe you made it user-configurable now?
indeed, and in a nice obscure way ... \setuplayout[topspace=1cm,height=middle] \setupbodyfont[11pt] \starttext \def\Test#1% {\vbox{{\bf#1}\blank\placeregister[index][language=cz,n=1,method={#1}]}\blank} wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank \startcolumns[n=3] \Test{mc,mm,uc} \Test{mc,zm,uc} \Test{mc,pm,uc} \Test{zc,mm,uc} \Test{zc,zm,uc} \Test{zc,pm,uc} \Test{pc,mm,uc} \Test{pc,zm,uc} \Test{pc,pm,uc} \stopcolumns \page wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank \startcolumns[n=3] \Test{mm,mc,uc} \Test{zm,mc,uc} \Test{pm,mc,uc} \Test{mm,zc,uc} \Test{zm,zc,uc} \Test{pm,zc,uc} \Test{mm,pc,uc} \Test{zm,pc,uc} \Test{pm,pc,uc} \stopcolumns \page \dorecurse {2} { \page \recurselevel: \index{oá} \index{öb} \index{Oč} \index{Öď} \index{oo} \index{öo} \index{Oo} \index{Öo} \index{Öq} \index{öř} \index{Oš} \index{oů} done } \stoptext
2. Is it really a good idea to make case-sensitive sorting the default in English? I can't remember seeing a single academic book in English that has this sort of index sorting.
Currently Jano and I are figuring out some details (as Jano does the testing with more complex multilingual indices). I have no preferece ... we can configure each language independently using the method key in the entries in sort-lan.lua As I seldom consult an index I have no clue what to expect or default to so feel free to tell me what the defaults should be. We now have predefined: local predefinedmethods = { [variables.before] = "mm,mc,uc", [variables.after] = "pm,mc,uc", [variables.first] = "pc,mm,uc", [variables.last] = "mc,mm,uc", } Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Oct 3, 2010, at 12:29 PM, Hans Hagen wrote:
indeed, and in a nice obscure way ...
\setuplayout[topspace=1cm,height=middle]
\setupbodyfont[11pt]
\starttext
\def\Test#1% {\vbox{{\bf#1}\blank\placeregister[index][language=cz,n=1,method={#1}]}\blank}
wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank
\startcolumns[n=3] \Test{mc,mm,uc} \Test{mc,zm,uc} \Test{mc,pm,uc} \Test{zc,mm,uc} \Test{zc,zm,uc} \Test{zc,pm,uc} \Test{pc,mm,uc} \Test{pc,zm,uc} \Test{pc,pm,uc} \stopcolumns
\page
wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank
\startcolumns[n=3] \Test{mm,mc,uc} \Test{zm,mc,uc} \Test{pm,mc,uc} \Test{mm,zc,uc} \Test{zm,zc,uc} \Test{pm,zc,uc} \Test{mm,pc,uc} \Test{zm,pc,uc} \Test{pm,pc,uc} \stopcolumns
\page
\dorecurse {2} { \page \recurselevel: \index{oá} \index{öb} \index{Oč} \index{Öď} \index{oo} \index{öo} \index{Oo} \index{Öo} \index{Öq} \index{öř} \index{Oš} \index{oů} done }
\stoptext
Give me a chance to understand :-) I tried looking in sort-ini.lua, but I couldn't figure out what the different methods meant. What do the abbreviations stand for? Also, I seem to obtain the desired case-insensitive sorting with method=zm,pc,uc but I also get spurious empty lines in the index. I'll try and come up with a minimal example.
2. Is it really a good idea to make case-sensitive sorting the default in English? I can't remember seeing a single academic book in English that has this sort of index sorting.
Currently Jano and I are figuring out some details (as Jano does the testing with more complex multilingual indices).
I have no preferece ... we can configure each language independently using the method key in the entries in sort-lan.lua As I seldom consult an index I have no clue what to expect or default to so feel free to tell me what the defaults should be. We now have predefined:
local predefinedmethods = { [variables.before] = "mm,mc,uc", [variables.after] = "pm,mc,uc", [variables.first] = "pc,mm,uc", [variables.last] = "mc,mm,uc", }
Hmm, if this is easy to configure, it doesn't make much of a difference. Just as a default, for English and German, I would suggest having no case-sensitivity. In German, umlauts are somewhat contentious, but nowadays, most people would sort them just like normal letters. But this is something that others on the list or on the wiki should express their opinion on. THanks, and all best Thomas
On 3-10-2010 12:58, Thomas A. Schmitz wrote:
On Oct 3, 2010, at 12:29 PM, Hans Hagen wrote:
indeed, and in a nice obscure way ...
\setuplayout[topspace=1cm,height=middle]
\setupbodyfont[11pt]
\starttext
\def\Test#1% {\vbox{{\bf#1}\blank\placeregister[index][language=cz,n=1,method={#1}]}\blank}
wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank
\startcolumns[n=3] \Test{mc,mm,uc} \Test{mc,zm,uc} \Test{mc,pm,uc} \Test{zc,mm,uc} \Test{zc,zm,uc} \Test{zc,pm,uc} \Test{pc,mm,uc} \Test{pc,zm,uc} \Test{pc,pm,uc} \stopcolumns
\page
wanted result: oá öb Oč Öď Oo Öo oo öo Öq öř Oš oů \blank
\startcolumns[n=3] \Test{mm,mc,uc} \Test{zm,mc,uc} \Test{pm,mc,uc} \Test{mm,zc,uc} \Test{zm,zc,uc} \Test{pm,zc,uc} \Test{mm,pc,uc} \Test{zm,pc,uc} \Test{pm,pc,uc} \stopcolumns
\page
\dorecurse {2} { \page \recurselevel: \index{oá} \index{öb} \index{Oč} \index{Öď} \index{oo} \index{öo} \index{Oo} \index{Öo} \index{Öq} \index{öř} \index{Oš} \index{oů} done }
\stoptext
Give me a chance to understand :-) I tried looking in sort-ini.lua, but I couldn't figure out what the different methods meant. What do the abbreviations stand for? Also, I seem to obtain the desired case-insensitive sorting with method=zm,pc,uc but I also get spurious empty lines in the index. I'll try and come up with a minimal example.
mm zm pm : use mapping order, add -1,0, +1 to different case and use shape info for missing entries (similar shapes) mc zc pc : use mapping order, add -1,0, +1 to different case uc: unicode order so, you define a sequence of comparisons where for instance U -> order u +/- 1 \"u -> order of shape u +/- 1 etc .. a bit cryptic I admit ... some combinations give the same result depending on the vectors used. (Jano promissed to write up something.) numbers are sorted in a special way so, at some point we simplify characters and start looking at shapes and sort based on shapes which of course leads to clashes so in a next step we look at unicodes etc etc
2. Is it really a good idea to make case-sensitive sorting the default in English? I can't remember seeing a single academic book in English that has this sort of index sorting.
Currently Jano and I are figuring out some details (as Jano does the testing with more complex multilingual indices).
I have no preferece ... we can configure each language independently using the method key in the entries in sort-lan.lua As I seldom consult an index I have no clue what to expect or default to so feel free to tell me what the defaults should be. We now have predefined:
local predefinedmethods = { [variables.before] = "mm,mc,uc", [variables.after] = "pm,mc,uc", [variables.first] = "pc,mm,uc", [variables.last] = "mc,mm,uc", }
Hmm, if this is easy to configure, it doesn't make much of a difference. Just as a default, for English and German, I would suggest having no case-sensitivity. In German, umlauts are somewhat contentious, but nowadays, most people would sort them just like normal letters. But this is something that others on the list or on the wiki should express their opinion on.
best would be to have a test file per language with in comments the expected order; such tests should also provide foreign entries for instance, how would you mix german and greek in your books; we probably need some specialized vectors then, which is possible as the sorting language can be configured independent from the text language Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Oct 3, 2010, at 5:10 PM, Hans Hagen wrote:
mm zm pm : use mapping order, add -1,0, +1 to different case and use shape info for missing entries (similar shapes) mc zc pc : use mapping order, add -1,0, +1 to different case uc: unicode order
so, you define a sequence of comparisons where for instance
U -> order u +/- 1 \"u -> order of shape u +/- 1
etc .. a bit cryptic I admit ... some combinations give the same result depending on the vectors used. (Jano promissed to write up something.)
numbers are sorted in a special way
so, at some point we simplify characters and start looking at shapes and sort based on shapes which of course leads to clashes so in a next step we look at unicodes etc etc
OK, that makes sense. I'll play with it, but having a few choice pages on the wiki would be great!
best would be to have a test file per language with in comments the expected order; such tests should also provide foreign entries
for instance, how would you mix german and greek in your books; we probably need some specialized vectors then, which is possible as the sorting language can be configured independent from the text language
OK, I'll write something for German and English, but the thing is that we need more input what users expect. For mixtures with foreign languages, there might not be generally accepted rules at all, so people will define something on an ad-hoc basis. For Greek: I just looked at a dozen books here on my shelf. Most English books have a separate index for Greek terms; when they sort Greek terms with English words, they use transliteration. The problem with polytonic Greek is that so many different unicode characters need to have the same sort entry. If I ever see the necessity of setting this up, I'll be in touch off-list, but it's such an unusual thing that I think you shouldn't bother now. All best Thomas
On 2010-10-03 <17:43:21>, Thomas A. Schmitz wrote:
OK, I'll write something for German and English, but the thing is that we need more input what users expect. For mixtures with foreign languages, there might not be generally accepted rules at all, so people will define something on an ad-hoc basis.
Hi Thomas and others, technically speaking the problem is solved by ISO 14651.[1] In praxi multilingual sorting depends on local rules, of which “One index per script|language.” seems to be the most common. Some time ago I made an lpeg from the bnf in [1]. It matches the collation rules from [2], but as I couldn’t figure out how to map them onto context’s sorting mechanism I never got around to actually capture the information. As I won’t be having the time to try it with the new structure of sort-lan I guess I’ll just attach the peg grammar for anyone to use as a starting point. Unicode collation would be great to have in context.
transliteration. The problem with polytonic Greek is that so many different unicode characters need to have the same sort entry. If
Isn’t that just what the Greek rules in sort-lan.lua do? If not then it would be a bug. ····startsnippet················································· definitions["gr"] = { entries = { ["α"] = "α", ["ά"] = "α", ["ὰ"] = "α", ["ᾶ"] = "α", ["ᾳ"] = "α", ["ἀ"] = "α", ["ἁ"] = "α", ["ἄ"] = "α", ["ἂ"] = "α", ["ἆ"] = "α", ["ἁ"] = "α", ["ἅ"] = "α", ["ἃ"] = "α", ["ἇ"] = "α", ["ᾁ"] = "α", ["ᾴ"] = "α", ["ᾲ"] = "α", ["ᾷ"] = "α", ["ᾄ"] = "α", ["ᾂ"] = "α", ["ᾅ"] = "α", ["ᾃ"] = "α", ["ᾆ"] = "α", ["ᾇ"] = "α", ["β"] = "β", ····stopsnippet·················································· Always nice to have a decent discussion on sorting ;) Philipp [1] http://standards.iso.org/ittf/PubliclyAvailableStandards/c044872_ISO_IEC_146... [2] http://www.iso.org/ittf/ISO14651_2006_TABLE1_En.txt -- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments
On 5-10-2010 2:15, Philipp Gesang wrote:
[1] http://standards.iso.org/ittf/PubliclyAvailableStandards/c044872_ISO_IEC_146... [2] http://www.iso.org/ittf/ISO14651_2006_TABLE1_En.txt
I'll have a look at it when I've time for it (I didn't know that doc; it's more fun figuring it out oneself anyway). Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Oct 5, 2010, at 2:15 PM, Philipp Gesang wrote:
Hi Thomas and others,
technically speaking the problem is solved by ISO 14651.[1]
In praxi multilingual sorting depends on local rules, of which “One index per script|language.” seems to be the most common.
Yes, that's what I was trying to say. In practice, hardly anyone will want an individual index for Spanish if they have just two Spanish words in an English book. And someone (me) might say that they want three Greek terms in their German index at logical places.
Some time ago I made an lpeg from the bnf in [1]. It matches the collation rules from [2], but as I couldn’t figure out how to map them onto context’s sorting mechanism I never got around to actually capture the information. As I won’t be having the time to try it with the new structure of sort-lan I guess I’ll just attach the peg grammar for anyone to use as a starting point. Unicode collation would be great to have in context.
transliteration. The problem with polytonic Greek is that so many different unicode characters need to have the same sort entry. If
Isn’t that just what the Greek rules in sort-lan.lua do? If not then it would be a bug.
Oh yes, you're right, I missed that. Thanks for pointing that out! Thomas
On 2010-10-05 <15:29:38>, Thomas A. Schmitz wrote:
And someone (me) might say that they want three Greek terms in their German index at logical places.
Try the definitions in the attachment. For three words only they will be fine. But if the count increases you will soon run into a situation where it’s not easy to determine where those “logical places” are. E.g. would you want the letter “υ” under latin “y” or “u”? Phonologically (might depend on your stance on historical phonology -- could be a minefield) you might find it reasonable to treat “ου” as “u” (or “ū” if that matters), but your audience might expect it at the graphetic location, latin “ou”, instead. As you can see in the example, when mapping both omega and omicron onto Latin “o” the result is that “χρῶμα” will appear before “Χρόνος”, which looks a bit odd. This ad-hoc solution is troublesome when two words (a German and a Greek one) occupy the same spot in the search order, like “Polyneikes” and “Πολυνείκης”. My index output is: Polyneikes 2 Πολυνείκης 2 Polyneikes 3 Πολυνείκης 3 which should rather be Polyneikes 2, 3 Πολυνείκης 2, 3 I guess there is some testing going on in order to determine whether to proceed with the current entry or switch to the next one. The position is the same, however the comparison with the last item fails and a new one is created instead. (Only guessing.) If you run into this problem you might have to ask Hans for advice. Hth, Philipp -- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments
On 5-10-2010 11:17, Philipp Gesang wrote:
I guess there is some testing going on in order to determine whether to proceed with the current entry or switch to the next one. The position is the same, however the comparison with the last item fails and a new one is created instead. (Only guessing.)
it's a sequence of tests per comparison, like Polyneikes polyneikes % lowercased polyneikes % shapes Polyneikes % unicode Πολυνείκης Πολυνείκης % lowercased polyneikes % shapes Πολυνείκης % unicode casing and shapes depends on the mapping vectors and the order can be influenced, you can see this in action with \enabletrackers[sorters.tests] Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On 2010-10-05 <23:27:33>, Hans Hagen wrote:
On 5-10-2010 11:17, Philipp Gesang wrote:
I guess there is some testing going on in order to determine whether to proceed with the current entry or switch to the next one. The position is the same, however the comparison with the last item fails and a new one is created instead. (Only guessing.)
it's a sequence of tests per comparison, like
Polyneikes polyneikes % lowercased polyneikes % shapes
I assume by “shapes” you mean the base symbol (all diacritics stripped).
Polyneikes % unicode
Πολυνείκης Πολυνείκης % lowercased polyneikes % shapes Πολυνείκης % unicode
casing and shapes depends on the mapping vectors and the order can be influenced, you can see this in action with
\enabletrackers[sorters.tests]
Bingo! The tracker instantly revealed a really nasty flaw in the German standard transcription for Greek: “υ” is normally converted to Latin “y”, but is retained as Latin “u” in diphthongs like “ευ” and “ηυ”. So with the sorting definition I posted I get amongst the results: sorters > Kapaneys > Kapaneus because all “υ” are lazily mapped to “y”. Thus, for those occasional three words per book, determining the sorting position by hand (e.g. “\index[Kapaneus]{Καπανεύς}”) might be less prone to error. Thanks for the hint and sorry for posting a non-solution, Philipp -- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments
On 5-10-2010 11:55, Philipp Gesang wrote:
I assume by “shapes” you mean the base symbol (all diacritics stripped).
indeed (and we might need to add/patch a few more shcodes to char-def.lua if needed) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On 11-2-2010 16:52, Thomas A. Schmitz wrote:
2. (Maybe not a bug, but a somewhat unfriendly behavior): When a \cite command refers to a non-existent key and sort=bbl, ConTeXt bombs out with a lua error:
so what do you expect? to drop that entry? or else, what default key to use? Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
participants (4)
-
David Rogers
-
Hans Hagen
-
Philipp Gesang
-
Thomas A. Schmitz