Regimes to be supported; Comments?

29 Jul 2005

      Hello,

Some time ago there was a discussion about extending support for 
different regimes in ConTeXt. The list of (to-be-)supported regimes 
probably depends strongly on the implementation (ruby+iconv?). I 
collected a preliminary list of candidate regimes and possible synonyms 
(some synonyms are listed there for backward compatibility and have to 
remain there), leaving out most of eastern encodings (not because they 
shouldn't be on the list, but because I'm completely ignorant about that).

Hans suggested to post this to the mailing list first to get some useful 
comments and suggestions.

#####

The following question should probably go in a separate thread, but it's 
a very similar thematic. In July 2006 Ljubljana will host people from 
around 85 coutries of the world. One of the very ambitious organizers is 
dreaming for already a couple of years to print the participant names 
(on honourable mentions for example, ...) in both latinic transcription 
and as they are written in original (under an assumption that the names 
are properly entered in a UTF-8 database). This is probably not possible 
to do for every single obscure language, but does it in general sound like:
a) Good luck (I don't want to be on your place)!
b) Take a good (commercial) program
c) If you're ready to invest the rest of your time (forget about 
hobbies!), it's probably doable in LaTeX or ConTeXt until then
č) Forget about TeX - it will be possible to solve this problem one day 
with unicode & one of the new TeX engines. But until then, it's not 
worth the effort, because any effort you may invest will become obsolete 
in a couple of years.

To be honest, even some people who will thanslate the materials into the 
native language, will probably do that with paper, pencil & scanner.

#####

Mojca

And here the encodings:

# ISO
     ISO-8859-1  Western
     ISO-8859-2  Central European
     ISO-8859-3  South European
     ISO-8859-4  Baltic
     ISO-8859-5  Cyrillic
     ISO-8859-6  Arabic
     ISO-8859-7  Greek
     ISO-8859-8  Hebrew Visual
     ISO-8859-8-I Hebrew (???) What is that?
     ISO-8859-9  Turkish
     ISO-8859-10 Nordic
     ISO-8859-11 Thai
     ISO-8859-13 Baltic
     ISO-8859-14 Celtic
     ISO-8859-15 Western
     ISO-8859-16 Romanian

     \defineregimesynonym[il*][iso-8859-*], *=1-16\12
     \defineregimesynonym[latin*][iso-8859-*], *=1-16\12
     \defineregimesynonym[cp819][iso-8859-1]

     % I'm not sure that anyone needs these:
     \defineregimesynonym[iso-ir-100][iso-8859-1]
     \defineregimesynonym[iso-ir-101][iso-8859-2]
     \defineregimesynonym[iso-ir-109][iso-8859-3]
     \defineregimesynonym[iso-ir-110][iso-8859-4]
     \defineregimesynonym[iso-ir-144][iso-8859-5]
     \defineregimesynonym[iso-ir-127][iso-8859-6]
     \defineregimesynonym[iso-ir-126][iso-8859-7]
     \defineregimesynonym[iso-ir-138][iso-8859-8]
     \defineregimesynonym[iso-ir-148][iso-8859-9]
     \defineregimesynonym[iso-ir-157][iso-8859-10]
     \defineregimesynonym[iso-ir-179][iso-8859-13]
     \defineregimesynonym[iso-ir-199][iso-8859-14]
     \defineregimesynonym[iso-ir-203][iso-8859-15]
     \defineregimesynonym[iso-ir-226][iso-8859-16]

     % backward compatibility
     \defineregimesynonym[iso88595][iso-8859-5]

     (recode also recognises "arabic", "greek", "cyrillic", "hebrew" as 
an alias for those encodings: I don't if this is a good idea as there 
are other charset operating with the same language groups as well)

# APPLE
     MacArabic
     MacCeltic
     MacCentralEuropean
% CentEur, CentralEurope or CentralEuropean? or all of them?
     MacChineseSimplified
     MacChineseTraditional
     MacCroatian
     MacCyrillic
     MacDevanagari
     MacDingbats
     MacFarsi
     MacGaelic
     MacGreek
     MacGujarati
     MacGurmukhi
     MacHebrew
     MacIcelandic
     MacInuit
     MacJapanese
     MacKeyboard
     MacKorean
     MacRoman
     MacRomanian
     MacSymbol
     MacThai
     MacTurkish
     MacUkrainian

     \defineregimesynonym[MacCE][MacCentralEuropean]
     \defineregimesynonym[mac][MacRoman]
     \defineregimesynonym[maccyr][MacCyrillic]
     \defineregimesynonym[macukr][MacUkrainian]

(I also need some help here: sometimes Mac encodings are defined using 
adjectives, sometimes using nouns, like Ukraine/Ukrainian. Should only 
one of them (which?) be used or both of them? On the unicode page, Mac 
encodings appear twice. The second time under Microsoft/Apple, 
containing MacCyrillic, MacGreek, MacIceland, MacLatin2, MacRoman, 
MacTurkish. I didn't really get the point for that.)

# IBM
     % essentially the same as under Microsoft, with some minor changes 
(to be processed manually, if these are to be supported)
# MICROSOFT
     EBCDIC % plenty of them are missing on the web
         cp037
         cp500
         cp875
         cp1026
     PC
         cp437 LatinUS
         cp737 Greek
         cp775 BaltRim
         cp850 Latin1
         cp852 Latin2
         cp855 Cyrillic
         cp857 Turkish
         cp860 Portuguese
         cp861 Icelandic
         cp862 Hebrew
         cp863 CanadaF
         cp864 Arabic
         cp865 Nordic
         cp866 Cyrillic - Russian
         cp869 Greek
         cp874 Thai
     WINDOWS
         cp874  Thai (repeats from some unknown reason)
         cp932  Japanese
         cp936  PRC GBK
         cp949  Korean
         cp950  Chinese

         cp1250 Central European
         cp1251 Cyrillic
         cp1252 Western
         cp1253 Greek
         cp1254 Turkish
         cp1255 Hebrew
         cp1256 Arabic
         cp1257 Baltic
         cp1258 Vietnamese

     \defineregimesynonym[cp125*][windows-125*], *=0-8

     % backward compatibility
     \defineregimesynonym[windows][cp1252]

     % there are some other possibilities:
     % ms-ee, ms-cyrl, ms-ansi, as-greek, ms-turk, ms-hebr, ms-arab, ...
     % anyone thinks that they are needed?

% It is not online in Unicode, but it is somewhere already:
     VISCII
     TCVN
     isoir111

\defineregimesynonym[isoir111][iso-ir-111]

#### Some very confusing part (I should leave it out) ####

# MISC (? probably none of them to be processed)
     AtariST
     cp424 Hebrew
     cp856 Hebrew
     cp1006 Arabic

# NEXT
     NextStep (What's that???)
         next

% Missing in Unicode mapping (online)
     TIS-620 Thai

Mojca Miklavec

Taco Hoekwater

tags

participants (2)