Hello,
Some time ago there was a discussion about extending support for
different regimes in ConTeXt. The list of (to-be-)supported regimes
probably depends strongly on the implementation (ruby+iconv?). I
collected a preliminary list of candidate regimes and possible synonyms
(some synonyms are listed there for backward compatibility and have to
remain there), leaving out most of eastern encodings (not because they
shouldn't be on the list, but because I'm completely ignorant about that).
Hans suggested to post this to the mailing list first to get some useful
comments and suggestions.
#####
The following question should probably go in a separate thread, but it's
a very similar thematic. In July 2006 Ljubljana will host people from
around 85 coutries of the world. One of the very ambitious organizers is
dreaming for already a couple of years to print the participant names
(on honourable mentions for example, ...) in both latinic transcription
and as they are written in original (under an assumption that the names
are properly entered in a UTF-8 database). This is probably not possible
to do for every single obscure language, but does it in general sound like:
a) Good luck (I don't want to be on your place)!
b) Take a good (commercial) program
c) If you're ready to invest the rest of your time (forget about
hobbies!), it's probably doable in LaTeX or ConTeXt until then
č) Forget about TeX - it will be possible to solve this problem one day
with unicode & one of the new TeX engines. But until then, it's not
worth the effort, because any effort you may invest will become obsolete
in a couple of years.
To be honest, even some people who will thanslate the materials into the
native language, will probably do that with paper, pencil & scanner.
#####
Mojca
And here the encodings:
# ISO
ISO-8859-1 Western
ISO-8859-2 Central European
ISO-8859-3 South European
ISO-8859-4 Baltic
ISO-8859-5 Cyrillic
ISO-8859-6 Arabic
ISO-8859-7 Greek
ISO-8859-8 Hebrew Visual
ISO-8859-8-I Hebrew (???) What is that?
ISO-8859-9 Turkish
ISO-8859-10 Nordic
ISO-8859-11 Thai
ISO-8859-13 Baltic
ISO-8859-14 Celtic
ISO-8859-15 Western
ISO-8859-16 Romanian
\defineregimesynonym[il*][iso-8859-*], *=1-16\12
\defineregimesynonym[latin*][iso-8859-*], *=1-16\12
\defineregimesynonym[cp819][iso-8859-1]
% I'm not sure that anyone needs these:
\defineregimesynonym[iso-ir-100][iso-8859-1]
\defineregimesynonym[iso-ir-101][iso-8859-2]
\defineregimesynonym[iso-ir-109][iso-8859-3]
\defineregimesynonym[iso-ir-110][iso-8859-4]
\defineregimesynonym[iso-ir-144][iso-8859-5]
\defineregimesynonym[iso-ir-127][iso-8859-6]
\defineregimesynonym[iso-ir-126][iso-8859-7]
\defineregimesynonym[iso-ir-138][iso-8859-8]
\defineregimesynonym[iso-ir-148][iso-8859-9]
\defineregimesynonym[iso-ir-157][iso-8859-10]
\defineregimesynonym[iso-ir-179][iso-8859-13]
\defineregimesynonym[iso-ir-199][iso-8859-14]
\defineregimesynonym[iso-ir-203][iso-8859-15]
\defineregimesynonym[iso-ir-226][iso-8859-16]
% backward compatibility
\defineregimesynonym[iso88595][iso-8859-5]
(recode also recognises "arabic", "greek", "cyrillic", "hebrew" as
an alias for those encodings: I don't if this is a good idea as there
are other charset operating with the same language groups as well)
# APPLE
MacArabic
MacCeltic
MacCentralEuropean
% CentEur, CentralEurope or CentralEuropean? or all of them?
MacChineseSimplified
MacChineseTraditional
MacCroatian
MacCyrillic
MacDevanagari
MacDingbats
MacFarsi
MacGaelic
MacGreek
MacGujarati
MacGurmukhi
MacHebrew
MacIcelandic
MacInuit
MacJapanese
MacKeyboard
MacKorean
MacRoman
MacRomanian
MacSymbol
MacThai
MacTurkish
MacUkrainian
\defineregimesynonym[MacCE][MacCentralEuropean]
\defineregimesynonym[mac][MacRoman]
\defineregimesynonym[maccyr][MacCyrillic]
\defineregimesynonym[macukr][MacUkrainian]
(I also need some help here: sometimes Mac encodings are defined using
adjectives, sometimes using nouns, like Ukraine/Ukrainian. Should only
one of them (which?) be used or both of them? On the unicode page, Mac
encodings appear twice. The second time under Microsoft/Apple,
containing MacCyrillic, MacGreek, MacIceland, MacLatin2, MacRoman,
MacTurkish. I didn't really get the point for that.)
# IBM
% essentially the same as under Microsoft, with some minor changes
(to be processed manually, if these are to be supported)
# MICROSOFT
EBCDIC % plenty of them are missing on the web
cp037
cp500
cp875
cp1026
PC
cp437 LatinUS
cp737 Greek
cp775 BaltRim
cp850 Latin1
cp852 Latin2
cp855 Cyrillic
cp857 Turkish
cp860 Portuguese
cp861 Icelandic
cp862 Hebrew
cp863 CanadaF
cp864 Arabic
cp865 Nordic
cp866 Cyrillic - Russian
cp869 Greek
cp874 Thai
WINDOWS
cp874 Thai (repeats from some unknown reason)
cp932 Japanese
cp936 PRC GBK
cp949 Korean
cp950 Chinese
cp1250 Central European
cp1251 Cyrillic
cp1252 Western
cp1253 Greek
cp1254 Turkish
cp1255 Hebrew
cp1256 Arabic
cp1257 Baltic
cp1258 Vietnamese
\defineregimesynonym[cp125*][windows-125*], *=0-8
% backward compatibility
\defineregimesynonym[windows][cp1252]
% there are some other possibilities:
% ms-ee, ms-cyrl, ms-ansi, as-greek, ms-turk, ms-hebr, ms-arab, ...
% anyone thinks that they are needed?
% It is not online in Unicode, but it is somewhere already:
VISCII
TCVN
isoir111
\defineregimesynonym[isoir111][iso-ir-111]
#### Some very confusing part (I should leave it out) ####
# MISC (? probably none of them to be processed)
AtariST
cp424 Hebrew
cp856 Hebrew
cp1006 Arabic
# NEXT
NextStep (What's that???)
next
% Missing in Unicode mapping (online)
TIS-620 Thai