Regimes to be supported; Comments?
Hello, Some time ago there was a discussion about extending support for different regimes in ConTeXt. The list of (to-be-)supported regimes probably depends strongly on the implementation (ruby+iconv?). I collected a preliminary list of candidate regimes and possible synonyms (some synonyms are listed there for backward compatibility and have to remain there), leaving out most of eastern encodings (not because they shouldn't be on the list, but because I'm completely ignorant about that). Hans suggested to post this to the mailing list first to get some useful comments and suggestions. ##### The following question should probably go in a separate thread, but it's a very similar thematic. In July 2006 Ljubljana will host people from around 85 coutries of the world. One of the very ambitious organizers is dreaming for already a couple of years to print the participant names (on honourable mentions for example, ...) in both latinic transcription and as they are written in original (under an assumption that the names are properly entered in a UTF-8 database). This is probably not possible to do for every single obscure language, but does it in general sound like: a) Good luck (I don't want to be on your place)! b) Take a good (commercial) program c) If you're ready to invest the rest of your time (forget about hobbies!), it's probably doable in LaTeX or ConTeXt until then č) Forget about TeX - it will be possible to solve this problem one day with unicode & one of the new TeX engines. But until then, it's not worth the effort, because any effort you may invest will become obsolete in a couple of years. To be honest, even some people who will thanslate the materials into the native language, will probably do that with paper, pencil & scanner. ##### Mojca And here the encodings: # ISO ISO-8859-1 Western ISO-8859-2 Central European ISO-8859-3 South European ISO-8859-4 Baltic ISO-8859-5 Cyrillic ISO-8859-6 Arabic ISO-8859-7 Greek ISO-8859-8 Hebrew Visual ISO-8859-8-I Hebrew (???) What is that? ISO-8859-9 Turkish ISO-8859-10 Nordic ISO-8859-11 Thai ISO-8859-13 Baltic ISO-8859-14 Celtic ISO-8859-15 Western ISO-8859-16 Romanian \defineregimesynonym[il*][iso-8859-*], *=1-16\12 \defineregimesynonym[latin*][iso-8859-*], *=1-16\12 \defineregimesynonym[cp819][iso-8859-1] % I'm not sure that anyone needs these: \defineregimesynonym[iso-ir-100][iso-8859-1] \defineregimesynonym[iso-ir-101][iso-8859-2] \defineregimesynonym[iso-ir-109][iso-8859-3] \defineregimesynonym[iso-ir-110][iso-8859-4] \defineregimesynonym[iso-ir-144][iso-8859-5] \defineregimesynonym[iso-ir-127][iso-8859-6] \defineregimesynonym[iso-ir-126][iso-8859-7] \defineregimesynonym[iso-ir-138][iso-8859-8] \defineregimesynonym[iso-ir-148][iso-8859-9] \defineregimesynonym[iso-ir-157][iso-8859-10] \defineregimesynonym[iso-ir-179][iso-8859-13] \defineregimesynonym[iso-ir-199][iso-8859-14] \defineregimesynonym[iso-ir-203][iso-8859-15] \defineregimesynonym[iso-ir-226][iso-8859-16] % backward compatibility \defineregimesynonym[iso88595][iso-8859-5] (recode also recognises "arabic", "greek", "cyrillic", "hebrew" as an alias for those encodings: I don't if this is a good idea as there are other charset operating with the same language groups as well) # APPLE MacArabic MacCeltic MacCentralEuropean % CentEur, CentralEurope or CentralEuropean? or all of them? MacChineseSimplified MacChineseTraditional MacCroatian MacCyrillic MacDevanagari MacDingbats MacFarsi MacGaelic MacGreek MacGujarati MacGurmukhi MacHebrew MacIcelandic MacInuit MacJapanese MacKeyboard MacKorean MacRoman MacRomanian MacSymbol MacThai MacTurkish MacUkrainian \defineregimesynonym[MacCE][MacCentralEuropean] \defineregimesynonym[mac][MacRoman] \defineregimesynonym[maccyr][MacCyrillic] \defineregimesynonym[macukr][MacUkrainian] (I also need some help here: sometimes Mac encodings are defined using adjectives, sometimes using nouns, like Ukraine/Ukrainian. Should only one of them (which?) be used or both of them? On the unicode page, Mac encodings appear twice. The second time under Microsoft/Apple, containing MacCyrillic, MacGreek, MacIceland, MacLatin2, MacRoman, MacTurkish. I didn't really get the point for that.) # IBM % essentially the same as under Microsoft, with some minor changes (to be processed manually, if these are to be supported) # MICROSOFT EBCDIC % plenty of them are missing on the web cp037 cp500 cp875 cp1026 PC cp437 LatinUS cp737 Greek cp775 BaltRim cp850 Latin1 cp852 Latin2 cp855 Cyrillic cp857 Turkish cp860 Portuguese cp861 Icelandic cp862 Hebrew cp863 CanadaF cp864 Arabic cp865 Nordic cp866 Cyrillic - Russian cp869 Greek cp874 Thai WINDOWS cp874 Thai (repeats from some unknown reason) cp932 Japanese cp936 PRC GBK cp949 Korean cp950 Chinese cp1250 Central European cp1251 Cyrillic cp1252 Western cp1253 Greek cp1254 Turkish cp1255 Hebrew cp1256 Arabic cp1257 Baltic cp1258 Vietnamese \defineregimesynonym[cp125*][windows-125*], *=0-8 % backward compatibility \defineregimesynonym[windows][cp1252] % there are some other possibilities: % ms-ee, ms-cyrl, ms-ansi, as-greek, ms-turk, ms-hebr, ms-arab, ... % anyone thinks that they are needed? % It is not online in Unicode, but it is somewhere already: VISCII TCVN isoir111 \defineregimesynonym[isoir111][iso-ir-111] #### Some very confusing part (I should leave it out) #### # MISC (? probably none of them to be processed) AtariST cp424 Hebrew cp856 Hebrew cp1006 Arabic # NEXT NextStep (What's that???) next % Missing in Unicode mapping (online) TIS-620 Thai
Hi Mojca, Re: encodings, this page may be of help: http://www.kostis.net/charsets/ I personally prefer to use iconv as a preprocessor (any to utf-8), so I don't really care all that much about supported encodings. I have some remarks anyway, of course ;-) I think that some of the encodings on your list are more like "keyboard mappings" than like actual input encodings (some MacXXX ones, for instance). The 'original' MICROSOFT/PC ones and EBCDIC have probably all fallen in disuse by now in 'normal operations'. I would not bother with them. Cheers, Taco Mojca Miklavec wrote:
a) Good luck (I don't want to be on your place)! b) Take a good (commercial) program c) If you're ready to invest the rest of your time (forget about hobbies!), it's probably doable in LaTeX or ConTeXt until then č) Forget about TeX - it will be possible to solve this problem one day with unicode & one of the new TeX engines. But until then, it's not worth the effort, because any effort you may invest will become obsolete in a couple of years.
I'm missing an option: d) you need some editorial and TeX skill but otherwise this is quite doable with current TeX/ConTeXt.
participants (2)
-
Mojca Miklavec
-
Taco Hoekwater