Arabic index entries
Arabic index entries are all listed under "unknown" instead of its respective Arabic letters. I'm not sure if this is a bug or a misconfiguration from my side. See the attached example. Regards, -- Khaled Hosny Arabic localizer and member of Arabeyes.org team
On Thu, 19 Jun 2008 18:23:05 -0600, Khaled Hosny
Arabic index entries are all listed under "unknown" instead of its respective Arabic letters. I'm not sure if this is a bug or a misconfiguration from my side. See the attached example.
We need to include arabic-farsi-urdu etc. databases in the distro. If Hans can tell us what file to emulate/edit etc.... Best wishes Idris -- Professor Idris Samawi Hamid, Editor-in-Chief International Journal of Shi`i Studies Department of Philosophy Colorado State University Fort Collins, CO 80523
On Fri, Jun 20, 2008 at 3:38 AM, Idris Samawi Hamid
On Thu, 19 Jun 2008 18:23:05 -0600, Khaled Hosny
wrote: Arabic index entries are all listed under "unknown" instead of its respective Arabic letters. I'm not sure if this is a bug or a misconfiguration from my side. See the attached example.
We need to include arabic-farsi-urdu etc. databases in the distro. If Hans can tell us what file to emulate/edit etc....
Index sorting in MkIV works currently only for english, dutch, czech and german. Take a look at "sort-lan.lua" to know what you have to do. Wolfgang
On Fri, 20 Jun 2008 00:14:49 -0600, Wolfgang Schuster
"sort-lan.lua"
Thanks, Wolfgang! Best wishes Idris -- Professor Idris Samawi Hamid, Editor-in-Chief International Journal of Shi`i Studies Department of Philosophy Colorado State University Fort Collins, CO 80523
Idris Samawi Hamid wrote:
On Thu, 19 Jun 2008 18:23:05 -0600, Khaled Hosny
wrote: Arabic index entries are all listed under "unknown" instead of its respective Arabic letters. I'm not sure if this is a bug or a misconfiguration from my side. See the attached example.
We need to include arabic-farsi-urdu etc. databases in the distro. If Hans can tell us what file to emulate/edit etc....
first we need to discuss the logic ... say that we have a sequence of chars ... do we need to erase the vowels? etc ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Fri, Jun 20, 2008 at 09:34:40AM +0200, Hans Hagen wrote:
Idris Samawi Hamid wrote:
On Thu, 19 Jun 2008 18:23:05 -0600, Khaled Hosny
wrote: Arabic index entries are all listed under "unknown" instead of its respective Arabic letters. I'm not sure if this is a bug or a misconfiguration from my side. See the attached example.
We need to include arabic-farsi-urdu etc. databases in the distro. If Hans can tell us what file to emulate/edit etc....
first we need to discuss the logic ... say that we have a sequence of chars ... do we need to erase the vowels? etc
Erase vowels as in not counting them? Then yes we should only respect full letters. We might need also need to strip the Arabic definite article "ال", but this will be tricky since there are words that start with it. May be we better have syntax like \index[a]{entry} where this entry will be under "a", or we already have this? Regards, Khaled -- Khaled Hosny Arabic localizer and member of Arabeyes.org team
Khaled Hosny wrote:
On Fri, Jun 20, 2008 at 09:34:40AM +0200, Hans Hagen wrote:
Idris Samawi Hamid wrote:
On Thu, 19 Jun 2008 18:23:05 -0600, Khaled Hosny
wrote: Arabic index entries are all listed under "unknown" instead of its respective Arabic letters. I'm not sure if this is a bug or a misconfiguration from my side. See the attached example. We need to include arabic-farsi-urdu etc. databases in the distro. If Hans can tell us what file to emulate/edit etc.... first we need to discuss the logic ... say that we have a sequence of chars ... do we need to erase the vowels? etc
Erase vowels as in not counting them? Then yes we should only respect full letters. We might need also need to strip the Arabic definite article "ال", but this will be tricky since there are words that start with it. May be we better have syntax like \index[a]{entry} where this entry will be under "a", or we already have this?
you can provide an optional sort key indeed Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
The issues of indexing, &c., probably fall into two issues: a. Is is something European-derivative in reference to a work or b. It is something entirely for "native-speaking" use and expectations? I've been on the Ivritex list for quite a while and there has been some long-running issues on how to deal with mixed versus pure texts and what people ought expect. I have seen considerable variance in Hebrew materials from the latter nineteenth-century to today in which they, for example, consider the ex-height in relation to superdiacritica and subdiacritica from nikkud to cantillation. They have had to tackle the issues of handling a mixed-versus non-mixed language text. It's nontrivial. Just from an historical perspective, at one time Latin and other languages concatenated the articles to the words, for example, in the nomenclature Alcoran for the Qur'an. Today in indexing (I have used Cindex to do quite a few book indices) one generally drops the definite and indefinite articles of most languages. Even in contents and chapter headings, one aviods articles except in informal literature for entertainment consumption. That may be language-dependent, for in German and Greek one does have to use articles more than in English. Still, I have seldom seen an index with arthrous forms in any language. CPS On Fri, 2008-06-20 at 19:02 +0300, Khaled Hosny wrote:
On Fri, Jun 20, 2008 at 09:34:40AM +0200, Hans Hagen wrote:
Idris Samawi Hamid wrote:
On Thu, 19 Jun 2008 18:23:05 -0600, Khaled Hosny
wrote: Arabic index entries are all listed under "unknown" instead of its respective Arabic letters. I'm not sure if this is a bug or a misconfiguration from my side. See the attached example.
We need to include arabic-farsi-urdu etc. databases in the distro. If Hans can tell us what file to emulate/edit etc....
first we need to discuss the logic ... say that we have a sequence of chars ... do we need to erase the vowels? etc
Erase vowels as in not counting them? Then yes we should only respect full letters. We might need also need to strip the Arabic definite article "ال", but this will be tricky since there are words that start with it. May be we better have syntax like \index[a]{entry} where this entry will be under "a", or we already have this?
Regards, Khaled
___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________
Khaled Hosny wrote:
Arabic index entries are all listed under "unknown" instead of its respective Arabic letters. I'm not sure if this is a bug or a misconfiguration from my side. See the attached example.
btw, some of these things have to wait till i have adapted mkiv in a more rigourous way. for instance i'm currently rewriting a sectioning code which is related to lists; in lists we need to let language and such into travel with the entries; the same is true for the index Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
participants (5)
-
Charles P. Schaum
-
Hans Hagen
-
Idris Samawi Hamid
-
Khaled Hosny
-
Wolfgang Schuster