Christopher Creutzig wrote:
Hans Hagen wrote:
So why not mapping the characters to unicode first and defining the mapping from unicode to \TeXcommand only once? regi-* files (at least in the meaning they have now) could be prepared automatically by a script, less error-prone and without the need to say "Some more definitions will be added later."
you mean ...
\defineactivetoken 123 {\uchar{...}{...}}
it is an option but it's much slower and take much more memory
I may be wrong, of course, but I think Mojca proposed something different (and something that should be really easy to implement): Have the unicode vectors stored in a format easily parsed by an external ruby script and create the regi-* files from that, using the conversion tables provided by your operating system or iconv or wherever ruby gets them from.
Yes, I had something different in mind. A1.) prepare the files to be used as a source of transformation from "any" character set to utf and prepare a list of synonyms for encodings (example: a file that says that in ISO-8859-2, character 0xA3 represents an unicode character 0x0141 (lstroke): for every character, for every Mac/Windows/iso/[...] encoding that we want to support) A2.) write a script which automatically generates regi-* files from those files, but regi-* files would contain only the mapping to unicode number (example: \startregime[iso-8859-2] ... \somecommandtomapacharactertounicode {163}{1}{65} % lstroke ... \stopregime) A3.) prepare a huge file with mapping from unicode numbers to ConTeXt commands (example: ... \somecommandtomapfromunicodetocontext {1}{65}{\lstroke} ...) A4.) ... I don't mind what ConTeXt does with this \lstroke afterwards, but it seems it is already clever enough to produce the (proper) glyph at the end What should ConTeXt do with that? B1.) The file under A3 should be processed at the beginning. As it may become really huge, exotic definitions should be only preloaded if asked for (\usemodule[korean]), while there is probably no harm if (accented) latin, greek, cyrillic and punctuation (TM, copyright, ..) are preloaded by default B2.) Once the \enableregime[iso-8859-2] or any other regime is requested, the file with the corresponding regime definitions is processed. However, as \somecommandtomapacharactertounicode {163}{1}{65} is processed, the character '163' is not stored as \uchar{1}{65}, but as \lstroke. '\somecommandtomapacharactertounicode' would first take a look which ConTeXt command is saved under \uchar{1}{65} and call the \defineactivetoken 179 {\lstroke} as a result. I don't know the details of the ConTeXt internal stuff, but I think (hope) that it should be possible to do it this way. B1 (preloading mapping from unicode to tex commands) is probably the only "hungry" step in the whole story. I think that it doesn't make any sense to ask the user to "\input regi-whatever". \enableregime and some additional definitions should be clever enough to find out which file to process in order to enable the proper regime. %%%%%%%%%%%%%%%%%%%%% Christopher's idea is actually yet another alternative, which combines the steps A2 and A3. If the mapping unicode->ConTeXt is in some easy-to-parse format, there's actually no additional effort if the script writes directly the ConTeXt commands instead of unicode numbers into regi-* files, so that B2 has some less work to do. As long as it is guaranteed that nobody will change these files manually, this is OK. The only drawback is that if someone notices that "\textellipsis" is more suitable than "\dots", the script has to be changed and the files have to be generated once more. If the character is mapped to (0x2026 HORIZONTAL ELLIPSIS) instead, only one line in the file with unicode->ConTeXt mapping (A3) has to be changed. If B2 cannot work as described, the Christopher's proposal would be the only proper way to go. %%%%%%%%%%%%%%%%%%%%% I wanted to test \showcharacters on the live.contextgarden.net (as Hans suggested that my map files are probably not OK), but it didn't compile there. (I hope it's not because of my buggy contributions in the last few days.) Is there any tool or macro to visialize all the glyphs available in a font? \showcharacters (if it works) shows only the glyphs that ConTeXt is aware of. What about the rest? Mojca