In the last days, I played around with some truetype fonts, preparing them for use with ConTeXt by creating tfms via the texnansi encoding. Some of these truetypes have expert features embedded into their glyphs. texnansi automatically takes care of integrating the "ff," "ffi" and "ffl" ligatures. In order to extract small caps and old-style numerals, I created a modified texnansi encoding. Here it comes: /TeXnANSISCEncoding [ /.notdef % 0 /Euro % /Uni20AC 1 /.notdef % 2 /.notdef % 3 /fraction % 4 /dotaccent % 5 /hungarumlaut % 6 /ogonek % 7 /fl % 8 /.notdef % /fraction % 9 not used (see 4), backward compatability only /cwm % 10 not used, except boundary char internally maybe /ff % 11 /fi % 12 /.notdef % /fl % 13 not used (see 8), backward compatability only /ffi % 14 /ffl % 15 /dotlessi % 16 /dotlessj % 17 /grave % 18 /acute % 19 /caron % 20 /breve % 21 /macron % 22 /ring % 23 /cedilla % 24 /germandbls % 25 /AEsmall % 26 /OEsmall % 27 /Oslashsmall % 28 /AE % 29 /OE % 30 /Oslash % 31 /space % 32 % /suppress in TeX text /exclam % 33 /quotedbl % 34 % /quotedblright in TeX text /numbersign % 35 /dollar % 36 /percent % 37 /ampersand % 38 /quoteright % 39 % /quotesingle in ANSI /parenleft % 40 /parenright % 41 /asterisk % 42 /plus % 43 /comma % 44 /hyphen % 45 /period % 46 /slash % 47 /zerooldstyle % 48 /oneoldstyle % 49 /twooldstyle % 50 /threeoldstyle % 51 /fouroldstyle % 52 /fiveoldstyle % 53 /sixoldstyle % 54 /sevenoldstyle % 55 /eightoldstyle % 56 /nineoldstyle % 57 /colon % 58 /semicolon % 59 /less % 60 % /exclamdown in Tex text /equal % 61 /greater % 62 % /questiondown in TeX text /question % 63 /at % 64 /A % 65 /B % 66 /C % 67 /D % 68 /E % 69 /F % 70 /G % 71 /H % 72 /I % 73 /J % 74 /K % 75 /L % 76 /M % 77 /N % 78 /O % 79 /P % 80 /Q % 81 /R % 82 /S % 83 /T % 84 /U % 85 /V % 86 /W % 87 /X % 88 /Y % 89 /Z % 90 /bracketleft % 91 /backslash % 92 % /quotedblleft in TeX text /bracketright % 93 /circumflex % 94 % /asciicircum in ASCII /underscore % 95 % /dotaccent in TeX text /quoteleft % 96 % /grave accent in ANSI /Asmall % 97 /Bsmall % 98 /Csmall % 99 /Dsmall % 100 /Esmall % 101 /Fsmall % 102 /Gsmall % 103 /Hsmall % 104 /Ismall % 105 /Jsmall % 106 /Ksmall % 107 /Lsmall % 108 /Msmall % 109 /Nsmall % 110 /Osmall % 111 /Psmall % 112 /Qsmall % 113 /Rsmall % 114 /Ssmall % 115 /Tsmall % 116 /Usmall % 117 /Vsmall % 118 /Wsmall % 119 /Xsmall % 120 /Ysmall % 121 /Zsmall % 122 /braceleft % 123 % /endash in TeX text /bar % 124 % /emdash in TeX test /braceright % 125 % /hungarumlaut in TeX text /tilde % 126 % /asciitilde in ASCII /dieresis % 127 not used (see 168), use higher up instead /Lslash % 128 this position is unfortunate, but now too late to fix /quotesingle % 129 /quotesinglbase % 130 /florin % 131 /quotedblbase % 132 /ellipsis % 133 /dagger % 134 /daggerdbl % 135 /circumflex % 136 /perthousand % 137 /Scaron % 138 /guilsinglleft % 139 /OE % 140 /Zcaron % 141 /asciicircum % 142 /minus % 143 /lslash % 144 /quoteleft % 145 /quoteright % 146 /quotedblleft % 147 /quotedblright % 148 /bullet % 149 /endash % 150 /emdash % 151 /tilde % 152 /trademark % 153 /scaron % 154 /guilsinglright % 155 /oe % 156 /zcaron % 157 /asciitilde % 158 /Ydieresis % 159 /nbspace % 160 % /space (no break space) /exclamdown % 161 /cent % 162 /sterling % 163 /currency % 164 /yen % 165 /brokenbar % 166 /section % 167 /dieresis % 168 /copyright % 169 /ordfeminine % 170 /guillemotleft % 171 /logicalnot % 172 /sfthyphen % 173 % /hyphen (hanging hyphen) /registered % 174 /macron % 175 /degree % 176 /plusminus % 177 /twosuperior % 178 /threesuperior % 179 /acute % 180 /mu % 181 /paragraph % 182 /periodcentered % 183 /cedilla % 184 /onesuperior % 185 /ordmasculine % 186 /guillemotright % 187 /onequarter % 188 /onehalf % 189 /threequarters % 190 /questiondown % 191 /Agrave % 192 /Aacute % 193 /Acircumflex % 194 /Atilde % 195 /Adieresis % 196 /Aring % 197 /AE % 198 /Ccedilla % 199 /Egrave % 200 /Eacute % 201 /Ecircumflex % 202 /Edieresis % 203 /Igrave % 204 /Iacute % 205 /Icircumflex % 206 /Idieresis % 207 /Eth % 208 /Ntilde % 209 /Ograve % 210 /Oacute % 211 /Ocircumflex % 212 /Otilde % 213 /Odieresis % 214 /multiply % 215 % OE in T1 /Oslash % 216 /Ugrave % 217 /Uacute % 218 /Ucircumflex % 219 /Udieresis % 220 /Yacute % 221 /Thorn % 222 /germandbls % 223 /Agravesmall % 224 /Aacutesmall % 225 /Acircumflexsmall % 226 /Atildesmall % 227 /Adieresissmall % 228 /Aringsmall % 229 /AEsmall % 230 /Ccedillasmall % 231 /Egravesmall % 232 /Eacutesmall % 233 /Ecircumflexsmall % 234 /Edieresissmall % 235 /Igravesmall % 236 /Iacutesmall % 237 /Icircumflexsmall % 238 /Idieresissmall % 239 /eth % 240 /Ntildesmall % 241 /Ogravesmall % 242 /Oacutesmall % 243 /Ocircumflexsmall % 244 /Otildesmall % 245 /Odieresissmall % 246 /divide % 247 % oe in T1 /Oslashsmall % 248 /Ugravesmall % 249 /Uacutesmall % 250 /Ucircumflexsmall % 251 /Udieresissmall % 252 /Yacutesmall % 253 /Thornsmall % 254 /Ydieresissmall % 255 % germandbls in T1 ] def With the help of this encoding, I was able to create small-cap fonts with old-style numerals that I could then use in my typescript. So I'm wondering: is this a good idea, or is there a simpler way of doing this? Best Thomas
Thomas A.Schmitz said this at Sat, 26 Feb 2005 10:13:00 +0100:
In the last days, I played around with some truetype fonts, preparing them for use with ConTeXt by creating tfms via the texnansi encoding.
Hello (again) Thomas, This is good stuff. I've tried to advocate a naming convention that would be appropriate to this. I would suggest calling this texnansi-osfsc.enc, as baseencoding-variant.enc. This is so a modified encoding can "masquerade" as the base encoding within ConTeXt. Given this encoding with my suggested name, you could therefore run texfont as following: texfont --encoding=texnansi --variant=osfsc --[other options] Variants that select rarer features that Old Style Figures and Small Caps may need to be given font-specific names, as rare glyph names tend to vary wildly between fonts. Cheers, adam -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Adam T. Lindsay, Computing Dept. atl@comp.lancs.ac.uk Lancaster University, InfoLab21 +44(0)1524/510.514 Lancaster, LA1 4WA, UK Fax:+44(0)1524/510.492 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
This is good stuff. I've tried to advocate a naming convention that would be appropriate to this. I would suggest calling this texnansi-osfsc.enc, as baseencoding-variant.enc. This is so a modified encoding can "masquerade" as the base encoding within ConTeXt.
Given this encoding with my suggested name, you could therefore run texfont as following: texfont --encoding=texnansi --variant=osfsc --[other options] That is by far the most elegant solution indeed! I have renamed my encoding file.
Variants that select rarer features that Old Style Figures and Small Caps may need to be given font-specific names, as rare glyph names tend to vary wildly between fonts.
Sadly, you are absolutely right about this. And it's not only rare glyphs that get wildly different names. There was some rumor on the TeX on OS X list that people couldn't get the beautiful HoeflerText font to work with TeX; it turned out that this was true for newer versions of the font only. I looked into it, and it turns out that Apple (?) has given new names even to quite "normal" characters - eacute becomes e_acute etc. So if you want to produce a tfm for that font, you have to invent a specific encoding vector. Once you know how this works, it's easy enough, but really annoying. So, the "variant" scheme in texfont is at least a convenient way to cope with this mess. Best Thomas
Thomas A.Schmitz said this at Sun, 27 Feb 2005 08:57:36 +0100:
I looked into it, and it turns out that Apple (?) has given new names even to quite "normal" characters - eacute becomes e_acute etc. So if you want to produce a tfm for that font, you have to invent a specific encoding vector. Once you know how this works, it's easy enough, but really annoying.
Indeed, that's one of the reasons why I came up with the unicode ("symbol"[1]) scripts... there are common utilities (ttx and Apple's ftx suite) that work well at associating canonical characters with glyph names specific to a font. I'm sure some enterprising XSLT hacker could take my scripts as a starting point and make them work with specific TeXy encodings, not just individual Unicode vectors.
So, the "variant" scheme in texfont is at least a convenient way to cope with this mess.
Well, it's the simplest of hacks to help *manage* the mess. And it's not really new--the concept of "variant" is all over Karl Berry's Fontname conventions. This just makes it a bit more user-friendly. [1] Which is to say that ttx2enc.xsl got its first public airing in http://homepage.mac.com/atl/tex/symb-uni.zip, in the context of "Unicode Symbols", but there's nothing inherent to symbols there--it's all about Unicode in general. Cheers, adam -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Adam T. Lindsay, Computing Dept. atl@comp.lancs.ac.uk Lancaster University, InfoLab21 +44(0)1524/510.514 Lancaster, LA1 4WA, UK Fax:+44(0)1524/510.492 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Adam, I feel like a complete idiot now. I had been so proud about this idea, but after re-reading you MyWay about OpenType, I see that I had been reinventing the wheel: this is exactly the solution you had been suggesting almost two years ago. Thanks for being generous about this... However, your post made me think: I know nothing about XSLT, but enough perl to shoot myself in the foot. I guess if I had a version of texnansi.enc with the unicode values in addition to the names, that would be a good starting point. I was thinking of this route: 1. use ftxdumperfuser to produce cmap.xml, 2. use perl to reduce it to two values: glyphName % UNICODE_VALUE 3. use perl to extract the lines corresponding to a given encoding and put them in the right order. Sounds feasible? Do you know where I could get such a unicode-aware version of texnansi.enc? Best Thomas On Feb 27, 2005, at 10:39 AM, Adam Lindsay wrote:
Indeed, that's one of the reasons why I came up with the unicode ("symbol"[1]) scripts... there are common utilities (ttx and Apple's ftx suite) that work well at associating canonical characters with glyph names specific to a font.
I'm sure some enterprising XSLT hacker could take my scripts as a starting point and make them work with specific TeXy encodings, not just individual Unicode vectors.
Thomas A.Schmitz said this at Wed, 2 Mar 2005 07:18:29 +0100:
However, your post made me think: I know nothing about XSLT, but enough perl to shoot myself in the foot. I guess if I had a version of texnansi.enc with the unicode values in addition to the names, that would be a good starting point. I was thinking of this route: 1. use ftxdumperfuser to produce cmap.xml, 2. use perl to reduce it to two values: glyphName % UNICODE_VALUE 3. use perl to extract the lines corresponding to a given encoding and put them in the right order.
Hold on one minute... we're talking about encodings for alternate glyphs, right? That's orthogonal to what Unicode is about. 'a' and 'Asmall' pretty much take up the same unicode "slot". Only 'a' appears in the .cmap.xml file. However, a perl-based solution would be very handy, especially as new free fonts like FPL-Neu (Palatino clone) include OSF and SC glyphs. http://home.vr-web.de/was/x/FPL/ The closest I got with perl was some experiments following some rough heuristics appending "small" to glyph names from an afm file. I got a bit discouraged, however, and didn't take it further at the time. So I clapped my hands and giggled girlishly when XeTeX came out and gave me easy access to the particular AAT fonts I was trying to get to work.
Sounds feasible? Do you know where I could get such a unicode-aware version of texnansi.enc?
Now that would be a useful thing, regardless. I don't know, but I'll have a look. I suspect we'll have to create one ourselves. An idle thought (with the corresponding devilishness) occurs to me: all that information is in ConTeXt already. Hmm... What form would be the best? Some simple XML? A perl-friendly list? -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Adam T. Lindsay, Computing Dept. atl@comp.lancs.ac.uk Lancaster University, InfoLab21 +44(0)1524/510.514 Lancaster, LA1 4WA, UK Fax:+44(0)1524/510.492 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
On Mar 2, 2005, at 10:59 AM, Adam Lindsay wrote:
Hold on one minute... we're talking about encodings for alternate glyphs, right? That's orthogonal to what Unicode is about. 'a' and 'Asmall' pretty much take up the same unicode "slot". Only 'a' appears in the .cmap.xml file.
No, of course you're right. I thought that they were given a value in the FFxx range, but that's not right; they don't appear in the cmap, only in the afm. So the only thing I can think of: there are only so many ways to refer to small caps, Xsmall or X.small or X_small or even X-small. We could provide alternatives for that in perl, making additions as we go. It's a brute-force attack, kind of aiming with a machine gun, but since fonts are such moving targets...
Now that would be a useful thing, regardless. I don't know, but I'll have a look. I suspect we'll have to create one ourselves. An idle thought (with the corresponding devilishness) occurs to me: all that information is in ConTeXt already. Hmm...
What form would be the best? Some simple XML? A perl-friendly list?
For the time being, I'm thinking of a very simple list that could just serve as a pattern for arranging the lines I get from processing the cmap.xml. I'm just thinking, not writing code yet... Best Thomas
Thomas A.Schmitz wrote:
For the time being, I'm thinking of a very simple list that could just serve as a pattern for arranging the lines I get from processing the cmap.xml. I'm just thinking, not writing code yet...
about code ... wybo dekker has cleaned up the texfont code, so that will be the starting point for extensions Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hans Hagen said this at Wed, 2 Mar 2005 11:54:35 +0100:
about code ... wybo dekker has cleaned up the texfont code, so that will be the starting point for extensions
Wow. Thanks, Wybo! -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Adam T. Lindsay, Computing Dept. atl@comp.lancs.ac.uk Lancaster University, InfoLab21 +44(0)1524/510.514 Lancaster, LA1 4WA, UK Fax:+44(0)1524/510.492 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Adam Lindsay wrote:
This is good stuff. I've tried to advocate a naming convention that would be appropriate to this. I would suggest calling this texnansi-osfsc.enc, as baseencoding-variant.enc. This is so a modified encoding can "masquerade" as the base encoding within ConTeXt.
i'll add the encoding to the distribution (i just made the formatted file with the info sent) [of course users will need to generate the tfm files themselves] once we have made the switch from map files to inline map code, we can apply different encodings more easily at the typescript level (no more need for map files) another thing coming is that pdftex will provide primitives to set those encodings independently of other characteristics (hartmut is working on this); Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
participants (4)
-
Adam Lindsay
-
h h extern
-
Hans Hagen
-
Thomas A.Schmitz