Hi, I've been writing a script that sifts through the unic-xxx.tex files to get a readable mapping what Unicode characters are supported using \Amacron-style names. In the process I found one bug and something that might be another bug: - the Cyrillic block (unic-004.tex) is missing an \unknownchar line for U+04CF, so that the remaining (few) glyphs are off by one - the Hebrew block (unic-005.tex) starts with a \numexpr line indicating an offset of 224 = E0; however, the first character in the list is U+05D0. So either the whole block is off by 16, starting at 0x0490 instead of 0x0500, or the 224 should be a 208 (=D0) instead. BTW unic-005.tex is the only file with Macintosh line endings. Are the unic-xxx files automatically generated or maintained by hand? Incidentally, it would be trivial now to put the list of ConTeXt glyphs on the Wiki, if anyone's interested. I wanted to use this to work towards better support for the whole range of ConTeXt glyphs with OpenType fonts under XeTeX, by reading what ConTeXt glyphs are available in a font and building a list of "\catcode`ā=\active \def ā {\amacron}"-style list for the rest. (Unfortunately this kind of list would be font-specific, but the generic alternative would be a huge list of active characters with an \ifnum\XeTeXcharglyph"....>0 macro behind it, and that would probable be quite slow.) I wonder if there is a more intelligent way to achieve this goal; since part of the logic for mapping code points into glyph macros exists already, it would be easier if there was a way to reuse that. The best way out would be if I could enable ConTeXt's UTF-8 regime while running XeTeX in \XeTeXinputencoding=bytes mode, but I haven't gotten that to work yet. Philipp
Philipp Reichmuth wrote:
I've been writing a script that sifts through the unic-xxx.tex files to get a readable mapping what Unicode characters are supported using \Amacron-style names.
mtxtools can create such lists using the unicode consotium glyph table, mojca's mapping list and enco/regi files we use mtxtools to create the tables needed for xetex (used for case mapping) and luatex (more extensive manipulations)
In the process I found one bug and something that might be another bug:
- the Cyrillic block (unic-004.tex) is missing an \unknownchar line for U+04CF, so that the remaining (few) glyphs are off by one
just mail me the patched file
- the Hebrew block (unic-005.tex) starts with a \numexpr line indicating an offset of 224 = E0; however, the first character in the list is U+05D0. So either the whole block is off by 16, starting at 0x0490 instead of 0x0500, or the 224 should be a 208 (=D0) instead. BTW unic-005.tex is the only file with Macintosh line endings. Are the unic-xxx files automatically generated or maintained by hand?
maintained by hand, again, just send me the fixed file, but we need to make sure that the fix is ok (i.e. works as expected)
Incidentally, it would be trivial now to put the list of ConTeXt glyphs on the Wiki, if anyone's interested.
there is a file contextnames.txt in the distributions (maintained by mojca), while the not yet distributed char-def.lua has the info for luatex
I wanted to use this to work towards better support for the whole range of ConTeXt glyphs with OpenType fonts under XeTeX, by reading what ConTeXt glyphs are available in a font and building a list of "\catcode`ā=\active \def ā {\amacron}"-style list for the rest. (Unfortunately this kind of list would be font-specific, but the generic alternative would be a huge list of active characters with an \ifnum\XeTeXcharglyph"....>0 macro behind it, and that would probable be quite slow.) I wonder if there is a more intelligent way to achieve this goal; since part of the logic for mapping code points into glyph macros exists already, it would be easier if there was a way to reuse that.
best take a look at mtxtools; if needed we can generate the definitions ; concerning speed, it will not be that slow, because tex is quite fast on such tests (unless XeTeXcharglyph is slow due to lib access); the biggest thing is to make sure that things don't expand in unwanted ways. (i must find time to update my xetex bin ; i must admit that i never tried to use open type fonts in xetex (the mac is broken)
The best way out would be if I could enable ConTeXt's UTF-8 regime while running XeTeX in \XeTeXinputencoding=bytes mode, but I haven't gotten that to work yet.
maybe mojca has Hans -- ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hans Hagen schrieb:
mtxtools can create such lists using the unicode consotium glyph table, mojca's mapping list and enco/regi files
we use mtxtools to create the tables needed for xetex (used for case mapping) and luatex (more extensive manipulations)
Sounds interesting. Where do I get that? It's not in the distribution.
maintained by hand, again, just send me the fixed file, but we need to make sure that the fix is ok (i.e. works as expected)
OK, I just sent you the files. Can anyone test this who can read Hebrew?
best take a look at mtxtools; if needed we can generate the definitions ; concerning speed, it will not be that slow, because tex is quite fast on such tests (unless XeTeXcharglyph is slow due to lib access); the biggest thing is to make sure that things don't expand in unwanted ways.
OK, I'll experiment a little bit and if anything comes of it I'll post again. Philipp
On 11/5/06, Hans Hagen wrote:
Philipp Reichmuth wrote:
I've been writing a script that sifts through the unic-xxx.tex files to get a readable mapping what Unicode characters are supported using \Amacron-style names.
mtxtools can create such lists using the unicode consotium glyph table, mojca's mapping list and enco/regi files
we use mtxtools to create the tables needed for xetex (used for case mapping) and luatex (more extensive manipulations)
I have mtxtools.bat, but no mtxtools.rb here.
Are the unic-xxx files automatically generated or maintained by hand?
maintained by hand, again, just send me the fixed file, but we need to make sure that the fix is ok (i.e. works as expected)
Although there should be no reason for not generating them automatically. I did that for regime files (I only wrote a script, executed it and Hans included the files, so it's only semi-automatic; it would be polite from me if I managed to incorporate that into existing [whateverthename]tools.rb).
Incidentally, it would be trivial now to put the list of ConTeXt glyphs on the Wiki, if anyone's interested.
there is a file contextnames.txt in the distributions (maintained by mojca), while the not yet distributed char-def.lua has the info for luatex
If you find errors there, please let me know. (Missing letter in Cyrillic was due to missing position in Unicode).
I wanted to use this to work towards better support for the whole range of ConTeXt glyphs with OpenType fonts under XeTeX, by reading what ConTeXt glyphs are available in a font and building a list of "\catcode`ā=\active \def ā {\amacron}"-style list for the rest. (Unfortunately this kind of list would be font-specific, but the generic alternative would be a huge list of active characters with an \ifnum\XeTeXcharglyph"....>0 macro behind it, and that would probable be quite slow.) I wonder if there is a more intelligent way to achieve this goal; since part of the logic for mapping code points into glyph macros exists already, it would be easier if there was a way to reuse that.
best take a look at mtxtools; if needed we can generate the definitions ; concerning speed, it will not be that slow, because tex is quite fast on such tests (unless XeTeXcharglyph is slow due to lib access); the biggest thing is to make sure that things don't expand in unwanted ways.
(i must find time to update my xetex bin ; i must admit that i never tried to use open type fonts in xetex (the mac is broken)
But OpenType fonts also work on Linux & Windows.
The best way out would be if I could enable ConTeXt's UTF-8 regime while running XeTeX in \XeTeXinputencoding=bytes mode, but I haven't gotten that to work yet.
maybe mojca has
You could theoretically comment out \beginXETEX \expandafter \endinput \endXETEX in regi-utf.tex, but that's not the best idea. Mojca
The best way out would be if I could enable ConTeXt's UTF-8 regime while running XeTeX in \XeTeXinputencoding=bytes mode, but I haven't gotten that to work yet.
That would mean that you loose the whole range of glyphs & scripts outside of the scope which ConTeXt supports (you would land almost at the level of pdfTeX again). For most european users that might still be something reasonable, but I wouldn't go that way.
maybe mojca has
(little correction to what I wrote in my previous mail) If you were really looking for that part of code - simply replace \expandafter \endinput inside XETEX block in regi-utf.tex with \XeTeXinputencoding=bytes. Then \enableregime[utf-8] will mean that ConTeXt took control over utf instead of XeTeX. From what I understood on the wiki, it probably used to be that way at the beginning, but then Hans changed his mind and decided to ignore \enableregime[utf] completely when processing with XeTeX. Mojca
Mojca Miklavec wrote:
The best way out would be if I could enable ConTeXt's UTF-8 regime while running XeTeX in \XeTeXinputencoding=bytes mode, but I haven't gotten that to work yet.
That would mean that you loose the whole range of glyphs & scripts outside of the scope which ConTeXt supports (you would land almost at the level of pdfTeX again). For most european users that might still be something reasonable, but I wouldn't go that way.
maybe mojca has
(little correction to what I wrote in my previous mail)
If you were really looking for that part of code - simply replace \expandafter \endinput inside XETEX block in regi-utf.tex with \XeTeXinputencoding=bytes. Then \enableregime[utf-8] will mean that ConTeXt took control over utf instead of XeTeX. From what I understood on the wiki, it probably used to be that way at the beginning, but then Hans changed his mind and decided to ignore \enableregime[utf] completely when processing with XeTeX.
indeed; when this is uncommented (i.e. traditional utf is used) ... do the patterns still work as expected? ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Mojca Miklavec wrote:
On 11/5/06, Hans Hagen wrote:
Philipp Reichmuth wrote:
I've been writing a script that sifts through the unic-xxx.tex files to get a readable mapping what Unicode characters are supported using \Amacron-style names.
mtxtools can create such lists using the unicode consotium glyph table, mojca's mapping list and enco/regi files
we use mtxtools to create the tables needed for xetex (used for case mapping) and luatex (more extensive manipulations)
I have mtxtools.bat, but no mtxtools.rb here.
hm, will be in the mkiv zip soon (or maybe all mkiv code will be in the main zip; depends on how many context users want to experiment with the declared-stable parts of luatex)
Are the unic-xxx files automatically generated or maintained by hand?
maintained by hand, again, just send me the fixed file, but we need to make sure that the fix is ok (i.e. works as expected)
Although there should be no reason for not generating them automatically. I did that for regime files (I only wrote a script, executed it and Hans included the files, so it's only semi-automatic; it would be polite from me if I managed to incorporate that into existing [whateverthename]tools.rb).
we should indeed discuss a was to keep these things up to date esp since in context mkiv we will use [0x00F4] = { unicodeslot=0x00F4, category='ll', adobename='ocircumflex', contextname='ocircumflex', description='LATIN SMALL LETTER O WITH CIRCUMFLEX', shcode=0x006F, uccode=0x00D4 }, [0x00F5] = { unicodeslot=0x00F5, category='ll', adobename='otilde', contextname='otilde', description='LATIN SMALL LETTER O WITH TILDE', shcode=0x006F, uccode=0x00D5 }, like table entries for manipulating encodings, fonts, and whatever
But OpenType fonts also work on Linux & Windows.
sure, but one needs this fontconfig thing ; in my opinion xetex makes sense when it integrates automatically into the os-specific fotn stuff, since xetex has the 'use libraries when possible' approach; so, i'll happily wait till the announced integration is there (i prefer to invest my time only once in this area: cook up a generic and flexible way for luatex and then derive xetex stuff from that) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hans Hagen schrieb:
hm, will be in the mkiv zip soon (or maybe all mkiv code will be in the main zip; depends on how many context users want to experiment with the declared-stable parts of luatex)
I would.
But OpenType fonts also work on Linux & Windows.
sure, but one needs this fontconfig thing ;
But at least on Windows that's relatively transparent. You tell fontconfig once where your fonts are (e.g. c:\windows\fonts), and that's it, basically. Philipp
Hans Hagen wrote:
depends on how many context users want to experiment with the declared-stable parts of luatex
I'll experiment, especially if I can figure out a set of magic kpathsea paths to keep mkii and mkiv in parallel. -Sanjoy `Never underestimate the evil of which men of power are capable.' --Bertrand Russell, _War Crimes in Vietnam_, chapter 1.
Sanjoy Mahajan wrote:
Hans Hagen wrote:
depends on how many context users want to experiment with the declared-stable parts of luatex
I'll experiment, especially if I can figure out a set of magic kpathsea paths to keep mkii and mkiv in parallel.
no need for that ; it is made to run in parallel, just an extra zip with mkiv and lua files ending up in base, and luatools.lua ending up in the script path; also, mkiv does not use kpse -) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
I'll experiment, especially if I can figure out a set of magic kpathsea paths to keep mkii and mkiv in parallel.
no need for that ; it is made to run in parallel, just an extra zip with mkiv and lua files ending up in base, and luatools.lua ending up in the script path; also, mkiv does not use kpse -)
Great. When the mkiv zip is available, I'll try it (tried poking around the pragma site but didn't find it). No kpse is indeed good news! About backward compatability, maybe the mkiv transition is the time to sacrifice backward compatability in a few instances where it makes the code or user interface simpler? One example off the top of my head is \setuppapersize[ABC] becoming equivalent to \setuppapersize[ABC][ABC] (rather than to \setuppapersize[ABC][A4]), and there are no doubt others. Or is the (understandable) policy of ConTeXt development that backward compatability is paramount? -Sanjoy `Never underestimate the evil of which men of power are capable.' --Bertrand Russell, _War Crimes in Vietnam_, chapter 1.
Sanjoy Mahajan wrote:
I'll experiment, especially if I can figure out a set of magic kpathsea paths to keep mkii and mkiv in parallel.
no need for that ; it is made to run in parallel, just an extra zip with mkiv and lua files ending up in base, and luatools.lua ending up in the script path; also, mkiv does not use kpse -)
Great. When the mkiv zip is available, I'll try it (tried poking around the pragma site but didn't find it). No kpse is indeed good news!
i will put it there as soon as tex live is really frozen since we don't want a mess up now
About backward compatability, maybe the mkiv transition is the time to sacrifice backward compatability in a few instances where it makes the code or user interface simpler? One example off the top of my head is \setuppapersize[ABC] becoming equivalent to \setuppapersize[ABC][ABC] (rather than to \setuppapersize[ABC][A4]), and there are no doubt others. Or is the (understandable) policy of ConTeXt development that backward compatability is paramount?
i've always tried to be downward compatible, but some changes are less dangerous (like the setuppapersize proposal) however, such changes then would also affect mkii (typesetting part mostly the same); another issue is that i want to move towards a 'macro package building block' approach so that one can combine components to make specialized versions anyhow, you can collect UI issues and organize a poll on the wiki Hans -- ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
participants (4)
-
Hans Hagen
-
Mojca Miklavec
-
Philipp Reichmuth
-
Sanjoy Mahajan