Taco Hoekwater wrote:
Here's what I can come up with. At least a few are acceptable, like the horizontal bar. \textnumero exists, but is only reachable in cyrillic encodings (fixable, I guess?), and the greek & vietnamese accents are also only usable in the correct encoding. I've used the \text... versions of the accents, but perhaps the actual commands are more correct (like \' and \~).
Cheers, Taco
\starttext \definecharacter texthorizontalbar {{--\kern 0pt--}} \definecharacter textdong {\underbar{\dstroke}}
Thanks for those ...
\NC 0300 COMBINING GRAVE ACCENT \NC \textgrave \NC \NR \NC 0309 COMBINING HOOK ABOVE \NC \texthookabove \NC \NR \NC 0303 COMBINING TILDE \NC \texttilde \NC \NR \NC 0301 COMBINING ACUTE ACCENT \NC \textacute \NC \NR \NC 0323 COMBINING DOT BELOW \NC \textbottomdot \NC \NR
I may be wrong, but aren't those used only in combination with other characters? I don't know if TeX (ConTeXt) can handle this (at least not yet). When I wrote the list a couple of days ago I forgot about that fact. If the accent would come before the charecter, this could be replaced by "\buildtextaccent...", but here there's perhaps no solution without some additional macros. (And since the Vietnamese seem to be satisfied with viscii and utf for now, supporting cp1258 is not crucial.) I double-checked the differences between the existing regimes and the ones that were automatically produced by a script. The list of regimes that are "ripe" for supporting is thus: cp125[ 0 | *1 | *2 | 3 | 4 | 7 ] iso-8859-[ *1 | *2 | 3 | 4 | *5 | *7 | 9 | 13 | *15 | 16 ] *viscii (with glyph names instead of \"\u\...) (The ones marked with a star are already supported, perhaps with some inconsistencies. Not supported: Hebrew, Arabic, Vietnamese? for cp125X and Arabic, Thai and Celtic for iso-8859-X.) I'll send the files (full content is already on my page), but I need to know how to split/group them (I guess it would be a bad idea to have one file for each encoding). Should there be one file for iso-8859 and one for windows encodings? What about those regimes that are already supported? I would like to move at least the "regi-win" (with 8 wrong definitions anyway) to a "less discriminating" place, don't know what to do with Greek and Cyrillic. And another set of questions: 1. Can someone check for (in)consistencies for greekupsilondiaeresis vs. greekupsilondialytika? Looks like the same glyph named differently at different places (functionality may break). 2. What to do with {\cyrillicGJE} {\'\cyrillicG} % 0403 CYRILLIC CAPITAL LETTER GJE {\cyrillicgje} {\'\cyrillicg} % 0453 CYRILLIC SMALL LETTER GJE {\cyrillicKJE} {\'\cyrillicK} % 040C CYRILLIC CAPITAL LETTER KJE {\cyrillickje} {\'\cyrillick} % 045C CYRILLIC SMALL LETTER KJE {\cyrillicgheupturn} {\cyrillicgup} % 0491 CYRILLIC SMALL LETTER GHE WITH UPTURN Which variant is better? Would it make sense to define \definecharacter cyrillicGJE {\buildtextaccent\textacute\cyrillicG} \defineaccent ' \cyrillicG {\cyrillicGJE} and then use \cyrillicGJE consistently? 3. PLEASE FIX: in enco-def.tex replace \cdots by something (\dots, I suppose, but I'm not sure) \definecharacter textellipsis {\mathematics\cdots} (I guess this "bug" was the reason for changing some definitions in regimes/encodings elsewhere.) Should \textellipsis be used for "2026 HORIZONTAL ELLIPSIS" or anything else? 4. \softhyphen, \hyphen or \- for "00AD SOFT HYPHEN"? 5. Urgently: what to do with quotations (without language discriminations if possible)? % 201A SINGLE LOW-9 QUOTATION MARK \quotesinglebase vs. \lowerleftsingleninequote % 201E DOUBLE LOW-9 QUOTATION MARK \quotedblbase vs. \lowerleftdoubleninequote % 2018 LEFT SINGLE QUOTATION MARK \quoteleft vs. \upperleftsinglesixquote % 2019 RIGHT SINGLE QUOTATION MARK \quoteright vs. \upperrightsingleninequote % 201C LEFT DOUBLE QUOTATION MARK \quotedblleft vs. \upperleftdoublesixquote % 201D RIGHT DOUBLE QUOTATION MARK \quotedblright vs. \upperrightdoubleninequote % 2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK \guilsingleleft vs. \leftsubguillemot % 203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK \guilsingleright vs. \rightsubguillemot % 00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK \leftguillemot vs. \greekleftquot (are Greek quotations treated specially or what is this doing in regi-grk?) % 00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK \rightguillemot vs. \greekrightquot vs. \prewordbreak\rightguillemot (in my point of view the last one may be better, but not fair since it's language dependent: may be OK for French, but not for German or vice versa; perhaps a language-sensitive macro could be inserted at this place?) 6. \textnumero, 0x2116 (and perhaps some other characters) should be added to unicode vector 33. 7. files regi-il1 and regi-win have many inconsistencies. I would like to suggest to do the following renamings: windows -> cp1252 il1 -> iso-8858-1 il2 -> iso-8858-2 iso88595 -> iso-8858-5 grk -> iso-8859-7 (the new one) and to add the following lines somewhere: % or perhaps the other way around \defineregimesynonym[utf-8][utf] \defineregimesynonym[utf8][utf] \defineregimesynonym[windows-1250][cp1250] \defineregimesynonym[windows-1251][cp1251] \defineregimesynonym[windows-1252][cp1252] \defineregimesynonym[windows-1253][cp1253] \defineregimesynonym[windows-1254][cp1254] %defineregimesynonym[windows-1255][cp1255] % not supported yet (Hebrew) %defineregimesynonym[windows-1256][cp1256] % not supported yet (Arabic) \defineregimesynonym[windows-1257][cp1257] %defineregimesynonym[windows-1258][cp1258] % not supported yet (Vietnamese) % for historical reasons \defineregimesynonym[windows][cp1252] % 5 - Cyrillic % 6 - Arabic (not supported) % 7 - Greek % 8 - Hebrew (3 signs missing) % 11 - Thai (not supported) \defineregimesynonym[il1][iso-8859-1] \defineregimesynonym[il2][iso-8859-2] \defineregimesynonym[il3][iso-8859-3] \defineregimesynonym[il4][iso-8859-4] \defineregimesynonym[il5][iso-8859-9] \defineregimesynonym[il6][iso-8859-10] \defineregimesynonym[il7][iso-8859-13] %defineregimesynonym[il8][iso-8859-14] % not supported yet \defineregimesynonym[il9][iso-8859-15] \defineregimesynonym[il10][iso-8859-16] \defineregimesynonym[latin1][iso-8859-1] \defineregimesynonym[latin2][iso-8859-2] \defineregimesynonym[latin3][iso-8859-3] \defineregimesynonym[latin4][iso-8859-4] \defineregimesynonym[latin5][iso-8859-9] \defineregimesynonym[latin6][iso-8859-10] \defineregimesynonym[latin7][iso-8859-13] %defineregimesynonym[latin8][iso-8859-14] % not supported yet \defineregimesynonym[latin9][iso-8859-15] \defineregimesynonym[latin10][iso-8859-16] % for historical reasons \defineregimesynonym[iso88595][iso-8859-5] \defineregimesynonym[grk][iso-8859-7] I can send the new files as soon as it gets clear how to group them. If additionalz the rest of the questions are answered, then new files can become more consistent without breaking anything. Sorry for the long mail, Mojca