Re: [NTG-context] Re: Dense Encoding - Comments wanted
Patrick Gundlach said this at Fri, 19 Aug 2005 12:34:11 +0200:
On Hans's suggestion, I've been working on a "dense" character encoding that does away with combining accents and extraneous symbol-like characters in favor of fully accented characters.
do you have an enc file to play with?
Sure. (didn't send to main list 'cos it's rough, still) I notice in the context of this latest discussion that I use Eth as a stand-in for Dcroat, mostly because I know slightly more about Icelandic than I do about Croatian. Although these slots are precious, perhaps I shouldn't do that. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Adam T. Lindsay, Computing Dept. atl@comp.lancs.ac.uk Lancaster University, InfoLab21 +44(0)1524/510.514 Lancaster, LA1 4WA, UK Fax:+44(0)1524/510.492 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Adam Lindsay wrote:
Sure. (didn't send to main list 'cos it's rough, still) I notice in the context of this latest discussion that I use Eth as a stand-in for Dcroat, mostly because I know slightly more about Icelandic than I do about Croatian. Although these slots are precious, perhaps I shouldn't do that.
If, in the end, you happen to have a single slot left over, can you thne please re-instate /cwm ? Hans and I intend to use a \language for url hyphenation in the future, and that means we need access to a non-marking \hyphenchar. Greetings, Taco
Taco Hoekwater said this at Fri, 19 Aug 2005 13:20:22 +0200:
If, in the end, you happen to have a single slot left over, can you thne please re-instate /cwm ?
Absolutely. I may as well do it now. Would you be happy with it in slot 0x20?
Hans and I intend to use a \language for url hyphenation in the future, and that means we need access to a non-marking \hyphenchar.
okay, *that's* the feedback I need. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Adam T. Lindsay, Computing Dept. atl@comp.lancs.ac.uk Lancaster University, InfoLab21 +44(0)1524/510.514 Lancaster, LA1 4WA, UK Fax:+44(0)1524/510.492 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
(only To: dev-context list now) Adam Lindsay wrote:
Taco Hoekwater said this at Fri, 19 Aug 2005 13:20:22 +0200:
If, in the end, you happen to have a single slot left over, can you thne please re-instate /cwm ?
Absolutely. I may as well do it now. Would you be happy with it in slot 0x20?
Any slot will do, it will only ever be accessed through a \hyphenchar \font = <int> assignment. Just take one you do not expect to need for anything else. Cheers, Taco
Taco Hoekwater said this at Fri, 19 Aug 2005 13:51:40 +0200:
Any slot will do, it will only ever be accessed through a \hyphenchar \font = <int> assignment. Just take one you do not expect to need for anything else.
Well, this is where some deep TeX experience comes in: slot (dec) 32 isn't really used for a space, ever, correct? So there's little reason to keep it as a space, right? -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Adam T. Lindsay, Computing Dept. atl@comp.lancs.ac.uk Lancaster University, InfoLab21 +44(0)1524/510.514 Lancaster, LA1 4WA, UK Fax:+44(0)1524/510.492 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Taco Hoekwater wrote:
Any slot will do, it will only ever be accessed through a \hyphenchar \font = <int> assignment. Just take one you do not expect to need for anything else.
so why take the problematic slot number 0 (zero) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Sure. (didn't send to main list 'cos it's rough, still)
btw: 'dense' can now be a parameter to http://fun.contextgarden.net/encodingtable/enctable.rb? So there is no /plus and /ampersand (and alike)? How would these be accessed? By making the chars active? Any possible drawbacks? What about an enco-den.tex file? Patrick -- ConTeXt wiki and more: http://contextgarden.net
Patrick Gundlach said this at Fri, 19 Aug 2005 13:53:53 +0200:
Sure. (didn't send to main list 'cos it's rough, still)
btw: 'dense' can now be a parameter to http://fun.contextgarden.net/encodingtable/enctable.rb?
Hmm. I'm not sure I understand what it's supposed to do.
So there is no /plus and /ampersand (and alike)? How would these be accessed? By making the chars active? Any possible drawbacks?
Correct, at least according to the original brief I was given, which I interpreted as: letters only, plus punctuation that comes in a normal text flow that is likely to be a participant in kerns and/or ligatures. So, no math or currency symbols (I can't remember why I put \equal back in there, so it's out again) as they can be treated as symbols and called within other fonts. It's up to you all to tell me the drawbacks of making these characters active, however...
What about an enco-den.tex file?
It'll come once there's a consensus on this layout. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Adam T. Lindsay, Computing Dept. atl@comp.lancs.ac.uk Lancaster University, InfoLab21 +44(0)1524/510.514 Lancaster, LA1 4WA, UK Fax:+44(0)1524/510.492 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Hi,
Hmm. I'm not sure I understand what it's supposed to do.
Well, I was comparing ec, tex256 and dense encoding, so I needed a different view on the encoding. Patrick -- ConTeXt wiki and more: http://contextgarden.net
Adam Lindsay wrote:
Correct, at least according to the original brief I was given, which I interpreted as: letters only, plus punctuation that comes in a normal text flow that is likely to be a participant in kerns and/or ligatures. So, no math or currency symbols (I can't remember why I put \equal back in there, so it's out again) as they can be treated as symbols and called within other fonts.
there is no reason to make them active; in the worst case we can preprocess to $=$ or &equal;
It's up to you all to tell me the drawbacks of making these characters active, however...
if possible, no active chars below 127 Hans -- ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Patrick Gundlach wrote:
Sure. (didn't send to main list 'cos it's rough, still)
btw: 'dense' can now be a parameter to http://fun.contextgarden.net/encodingtable/enctable.rb?
So there is no /plus and /ampersand (and alike)? How would these be accessed? By making the chars active? Any possible drawbacks?
if we talk utf ... they can be symbols and accessed with \getglyph from a companion font; the don't play a role in hyphenation
What about an enco-den.tex file?
sure, just provide it ... Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
btw., since you took ec as a starting point: in ec, but not in dense ["grave", "acute", "circumflex", "tilde", "dieresis", "hungarumlaut", "ring", "caron", "breve", "macron", "dotaccent", "cedilla", "ogonek", "guilsinglleft", "guilsinglright", "cwm", "zeroinferior", "dotlessj", "visualspace", "quotedbl", "numbersign", "dollar", "percent", "ampersand", "asterisk", "plus", "less", "greater", "backslash", "asciicircum", "underscore", "braceleft", "bar", "braceright", "asciitilde", "Ng", "Tcedilla", "IJ", "dbar", "section", "ng", "tquoteright", "tcedilla", "ij", "sterling", "Germandbls"] in dense, but not in ec ["Amacron", "Cdotaccent", "Edotaccent", "Emacron", "Gcommaaccent", "Gdotaccent", "Hbar", "Imacron", "Iogonek", "Kcommaaccent", "Lcommaaccent", "dcroat", "amacron", "cdotaccent", "edotaccent", "emacron", "gcommaaccent", "gdotaccent", "hbar", "imacron", "iogonek", "kcommaaccent", "lcommaaccent", "Omacron", "Umacron", "Uogonek", "omacron", "umacron", "uogonek", "Ncommaaccent", "Tcommaaccent", "Rcommaaccent", "Scommaaccent", "Wcircumflex", "Ycircumflex", "ncommaaccent", "tcaron", "tcommaaccent", "rcommaaccent", "scommaaccent", "wcircumflex", "ycircumflex", "Ygrave", "ygrave"] Patrick -- ConTeXt wiki and more: http://contextgarden.net
Patrick Gundlach wrote:
btw., since you took ec as a starting point:
in ec, but not in dense
["grave", "acute", "circumflex", "tilde", "dieresis", "hungarumlaut", "ring", "caron", "breve", "macron", "dotaccent", "cedilla", "ogonek",
these are combinable accents
"guilsinglleft", "guilsinglright",
why are these out when their "double" companions are still in?
"cwm",
I've asked for the compound word marker to return, but: for proper URL names (almost) all of ascii is needed, including a number of the ones that were dropped already, like "percent", "plus" and "asciitilde". Adding all of those chars would revert the encoding back towards EC quite a bit, so maybe we'd better switch to texnansi for URLs anyway. That means cwm is not immediately needed, either
"zeroinferior",
what a weird thing is this, anyway?
"dotlessj",
was needed for combinable accents
"visualspace",
only relevant in special verbatims, as used by DEK. ;-)
"quotedbl", "numbersign", "dollar", "percent", "ampersand", "asterisk", "plus", "less", "greater", "backslash", "asciicircum", "underscore", "braceleft", "bar", "braceright", "asciitilde",
ascii symbols
"Ng", "ng",
That's the letter Eng, used in sami (and in phonetics). Useless since the rest of the sami letters are not included.
"Tcedilla",
Tcommaaccent?
"IJ", "ij",
Hey, That's dutch! :-) Officially it is a "digraph with casing hint" or so, and not really part of the alphabet. But otoh, we are supposed to type "IJsland" instead of "Ijsland" (iceland), and it is an official character in unicode.
"dbar",
wasn't this something technical?
"section",
"tquoteright",
tcaron?
"tcedilla",
tcommaaccent?
"sterling",
to be replaced by "euro" ;-)
"Germandbls"
That is the "SS" right? Is was needed to make \uppercase{stra/e} work, but I doubt that is correct (because it really isn't a single character) Cheers, Taco
Patrick, thanks for the diff. Taco Hoekwater said this at Fri, 19 Aug 2005 14:45:26 +0200:
Patrick Gundlach wrote:
btw., since you took ec as a starting point:
in ec, but not in dense
["grave", "acute", "circumflex", "tilde", "dieresis", "hungarumlaut", "ring", "caron", "breve", "macron", "dotaccent", "cedilla", "ogonek",
these are combinable accents
Indeed.
"guilsinglleft", "guilsinglright",
why are these out when their "double" companions are still in?
As Hans pointed out to me, the single ones aren't really used typographically, as far as ConTeXt is concerned.
"cwm",
I've asked for the compound word marker to return, but: for proper URL names (almost) all of ascii is needed, including a number of the ones that were dropped already, like "percent", "plus" and "asciitilde".
Adding all of those chars would revert the encoding back towards EC quite a bit, so maybe we'd better switch to texnansi for URLs anyway. That means cwm is not immediately needed, either
Ah, hmm. A point I hadn't really considered. I've considered using TeX 'n' ANSI as the "companion" font anyway, so that is very plausible as a workaround.
"zeroinferior",
what a weird thing is this, anyway?
I think it's there to help compose a perthousand/permyriad symbol.
"dotlessj",
was needed for combinable accents
especially for math...
"visualspace",
only relevant in special verbatims, as used by DEK. ;-)
"quotedbl", "numbersign", "dollar", "percent", "ampersand", "asterisk", "plus", "less", "greater", "backslash", "asciicircum", "underscore", "braceleft", "bar", "braceright", "asciitilde",
ascii symbols
"Ng", "ng",
That's the letter Eng, used in sami (and in phonetics). Useless since the rest of the sami letters are not included.
Yup.
"Tcedilla",
Tcommaaccent?
I investigated it, and Tcedilla is really a mislabeled Tcommaaccent. <http://groups.msn.com/fontlab/tipsandtricks.msnw? action=get_message&mview=1&ID_Message=3233>
"IJ", "ij",
Hey, That's dutch! :-)
:P
Officially it is a "digraph with casing hint" or so, and not really part of the alphabet. But otoh, we are supposed to type "IJsland" instead of "Ijsland" (iceland), and it is an official character in unicode.
I asked Hans. He didn't want it, didn't use it, and thought it was ugly. Beauty over truth. :)
"dbar",
wasn't this something technical?
I thought it was a mis-labeled dcroat. (However, these names within fonts are awfully fluid. I wouldn't call it that for the stroked-d in Vietnamese, but that seems to be what the Adobe Glyph List favors.)
"section",
"tquoteright",
tcaron?
oops. I think I accommodated the LM glyph names a little too much. Thanks.
"tcedilla",
tcommaaccent?
as above.
"sterling",
to be replaced by "euro" ;-)
not any time soon, mate. :)
"Germandbls"
That is the "SS" right? Is was needed to make \uppercase{stra/e} work, but I doubt that is correct (because it really isn't a single character)
That was my view on it. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Adam T. Lindsay, Computing Dept. atl@comp.lancs.ac.uk Lancaster University, InfoLab21 +44(0)1524/510.514 Lancaster, LA1 4WA, UK Fax:+44(0)1524/510.492 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Adam Lindsay said this at Fri, 19 Aug 2005 14:07:29 +0100:
"tquoteright",
tcaron?
oops. I think I accommodated the LM glyph names a little too much. Thanks.
Wait. I got this right. Apologies retracted :). Yes, tcaron is the more correct name for tquoteright. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Adam T. Lindsay, Computing Dept. atl@comp.lancs.ac.uk Lancaster University, InfoLab21 +44(0)1524/510.514 Lancaster, LA1 4WA, UK Fax:+44(0)1524/510.492 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Adam Lindsay wrote:
"guilsinglleft", "guilsinglright",
why are these out when their "double" companions are still in?
As Hans pointed out to me, the single ones aren't really used typographically, as far as ConTeXt is concerned.
Ah, I see.
I've considered using TeX 'n' ANSI as the "companion" font anyway, so that is very plausible as a workaround.
In that case, I propose a completely new companion encoding with a name like "asciibats", that has all of the visible ascii glyphs in their normal spots, and the other text-like symbols in the upper range.
"Tcedilla",
Tcommaaccent?
I investigated it, and Tcedilla is really a mislabeled Tcommaaccent. <http://groups.msn.com/fontlab/tipsandtricks.msnw? action=get_message&mview=1&ID_Message=3233>
I thought so. Even the unicode book hints that "T with cedilla" was perhaps a mistake
"IJ", "ij",
I asked Hans. He didn't want it, didn't use it, and thought it was ugly. Beauty over truth. :)
Fine by me.
"tquoteright",
tcaron?
oops. I think I accommodated the LM glyph names a little too much. Thanks.
yours is right (caron), EC is wrong (quoteright) Taco
Taco Hoekwater said this at Fri, 19 Aug 2005 15:32:00 +0200:
I've considered using TeX 'n' ANSI as the "companion" font anyway, so that is very plausible as a workaround.
In that case, I propose a completely new companion encoding with a name like "asciibats", that has all of the visible ascii glyphs in their normal spots, and the other text-like symbols in the upper range.
Cool. That'll be next. Any non-obvious suggestions off the top of your head? -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Adam T. Lindsay, Computing Dept. atl@comp.lancs.ac.uk Lancaster University, InfoLab21 +44(0)1524/510.514 Lancaster, LA1 4WA, UK Fax:+44(0)1524/510.492 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Hi,
Cool. That'll be next. Any non-obvious suggestions off the top of your head?
not non-obvious, but you might consider ts1 as a starting point/inspiraton. http://fun.contextgarden.net/encodingtable/enctable.rb?ts1 Patrick -- ConTeXt wiki and more: http://contextgarden.net
Adam Lindsay wrote:
Taco Hoekwater said this at Fri, 19 Aug 2005 15:32:00 +0200:
I've considered using TeX 'n' ANSI as the "companion" font anyway, so that is very plausible as a workaround.
In that case, I propose a completely new companion encoding with a name like "asciibats", that has all of the visible ascii glyphs in their normal spots, and the other text-like symbols in the upper range.
Cool. That'll be next. Any non-obvious suggestions off the top of your head?
For glyphs to be included you mean? At least everything in /AdobeStandard, MacRoman Encoding and the Windows 1252 codepage that's not in /AdamsDenseEncoding already. Besides that, I'm not sure. LaTeX's TS1 has lots of expert glyphs, but the whole concept of expert fonts is rapidly becoming obsolete... The rest
Hi,
For glyphs to be included you mean? At least everything in /AdobeStandard, MacRoman Encoding and the Windows 1252 codepage that's not in /AdamsDenseEncoding already.
are there .enc files for MacRoman or CP 1252? Google wasn't helpful. If there are, I could generate diffs. Patrick -- ConTeXt wiki and more: http://contextgarden.net
are there .enc files for MacRoman or CP 1252? Google wasn't helpful. If there are, I could generate diffs.
latin1 taken from: http://swtch.com/usr/local/plan9/postscript/prologues/Latin1.enc win1252 = latin1 + /usr/lib/X11/fonts/encodings/microsoft-cp1252.enc ?? macenc (the one Taco sent me). I might check against the unicode.org versions. Perhaps one of you could have a look, I'll send the diffs later. (Monday?) dec | oct |hex | WIN1252Encoding | ISOLatin1Encodin | MacintoshEncodin | ------------------------------------------------------------------------- ... [omitted, std ascii] 126 | 176 | 7e | asciitilde | asciitilde | tilde | 127 | 177 | 7f | --- | --- | --- | ------------------------------------------------------------------------- 128 | 200 | 80 | euro | --- | Adieresis | 129 | 201 | 81 | --- | --- | Aring | 130 | 202 | 82 | quotesinglbase | --- | Ccedilla | 131 | 203 | 83 | florin | --- | Eacute | 132 | 204 | 84 | quotedblbase | --- | Ntilde | 133 | 205 | 85 | ellipsis | --- | Odieresis | 134 | 206 | 86 | dagger | --- | Udieresis | 135 | 207 | 87 | daggerdbl | --- | aacute | 136 | 210 | 88 | circumflex | --- | agrave | 137 | 211 | 89 | perthousand | --- | acircumflex | 138 | 212 | 8a | Scaron | --- | adieresis | 139 | 213 | 8b | guilsinglleft | --- | atilde | 140 | 214 | 8c | OE | --- | aring | 141 | 215 | 8d | Zcaron | --- | ccedilla | 142 | 216 | 8e | quoteleft | --- | eacute | 143 | 217 | 8f | quoteright | --- | egrave | 144 | 220 | 90 | dotlessi | dotlessi | ecircumflex | 145 | 221 | 91 | quoteleft | grave | edieresis | 146 | 222 | 92 | quoteright | acute | iacute | 147 | 223 | 93 | quotedblleft | circumflex | igrave | 148 | 224 | 94 | quotedblright | tilde | icircumflex | 149 | 225 | 95 | bullet | macron | idieresis | 150 | 226 | 96 | endash | breve | ntilde | 151 | 227 | 97 | emdash | dotaccent | oacute | 152 | 230 | 98 | tilde | dieresis | ograve | 153 | 231 | 99 | trademark | --- | ocircumflex | 154 | 232 | 9a | scaron | ring | odieresis | 155 | 233 | 9b | guilsinglright | cedilla | otilde | 156 | 234 | 9c | oe | --- | uacute | 157 | 235 | 9d | hungarumlaut | hungarumlaut | ugrave | 158 | 236 | 9e | zcaron | ogonek | ucircumflex | 159 | 237 | 9f | Ydieresis | caron | udieresis | ------------------------------------------------------------------------- 160 | 240 | a0 | space | space | dagger | 161 | 241 | a1 | exclamdown | exclamdown | degree | 162 | 242 | a2 | cent | cent | cent | 163 | 243 | a3 | sterling | sterling | sterling | 164 | 244 | a4 | currency | currency | section | 165 | 245 | a5 | yen | yen | bullet | 166 | 246 | a6 | brokenbar | brokenbar | paragraph | 167 | 247 | a7 | section | section | germandbls | 168 | 250 | a8 | dieresis | dieresis | registered | 169 | 251 | a9 | copyright | copyright | copyright | 170 | 252 | aa | ordfeminine | ordfeminine | trademark | 171 | 253 | ab | guillemotleft | guillemotleft | acute | 172 | 254 | ac | logicalnot | logicalnot | dieresis | 173 | 255 | ad | hyphen | hyphen | notequal | 174 | 256 | ae | registered | registered | AE | 175 | 257 | af | macron | macron | Oslash | 176 | 260 | b0 | degree | degree | infinity | 177 | 261 | b1 | plusminus | plusminus | plusminus | 178 | 262 | b2 | twosuperior | twosuperior | lessequal | 179 | 263 | b3 | threesuperior | threesuperior | greaterequal | 180 | 264 | b4 | acute | acute | yen | 181 | 265 | b5 | mu | mu | mu | 182 | 266 | b6 | paragraph | paragraph | partialdiff | 183 | 267 | b7 | periodcentered | periodcentered | Sigma | 184 | 270 | b8 | cedilla | cedilla | product | 185 | 271 | b9 | onesuperior | onesuperior | pi | 186 | 272 | ba | ordmasculine | ordmasculine | integral | 187 | 273 | bb | guillemotright | guillemotright | ordfeminine | 188 | 274 | bc | onequarter | onequarter | ordmasculine | 189 | 275 | bd | onehalf | onehalf | Omega | 190 | 276 | be | threequarters | threequarters | ae | 191 | 277 | bf | questiondown | questiondown | oslash | ------------------------------------------------------------------------- 192 | 300 | c0 | Agrave | Agrave | questiondown | 193 | 301 | c1 | Aacute | Aacute | exclamdown | 194 | 302 | c2 | Acircumflex | Acircumflex | logicalnot | 195 | 303 | c3 | Atilde | Atilde | radical | 196 | 304 | c4 | Adieresis | Adieresis | florin | 197 | 305 | c5 | Aring | Aring | approxequal | 198 | 306 | c6 | AE | AE | Delta | 199 | 307 | c7 | Ccedilla | Ccedilla | guillemotleft | 200 | 310 | c8 | Egrave | Egrave | guillemotright | 201 | 311 | c9 | Eacute | Eacute | ellipsis | 202 | 312 | ca | Ecircumflex | Ecircumflex | space | 203 | 313 | cb | Edieresis | Edieresis | Agrave | 204 | 314 | cc | Igrave | Igrave | Atilde | 205 | 315 | cd | Iacute | Iacute | Otilde | 206 | 316 | ce | Icircumflex | Icircumflex | OE | 207 | 317 | cf | Idieresis | Idieresis | oe | 208 | 320 | d0 | Eth | Eth | endash | 209 | 321 | d1 | Ntilde | Ntilde | emdash | 210 | 322 | d2 | Ograve | Ograve | quotedblleft | 211 | 323 | d3 | Oacute | Oacute | quotedblright | 212 | 324 | d4 | Ocircumflex | Ocircumflex | quoteleft | 213 | 325 | d5 | Otilde | Otilde | quoteright | 214 | 326 | d6 | Odieresis | Odieresis | divide | 215 | 327 | d7 | multiply | multiply | lozenge | 216 | 330 | d8 | Oslash | Oslash | ydieresis | 217 | 331 | d9 | Ugrave | Ugrave | Ydieresis | 218 | 332 | da | Uacute | Uacute | fraction | 219 | 333 | db | Ucircumflex | Ucircumflex | currency | 220 | 334 | dc | Udieresis | Udieresis | guilsinglleft | 221 | 335 | dd | Yacute | Yacute | guilsinglright | 222 | 336 | de | Thorn | Thorn | fi | 223 | 337 | df | germandbls | germandbls | fl | ------------------------------------------------------------------------- 224 | 340 | e0 | agrave | agrave | daggerdbl | 225 | 341 | e1 | aacute | aacute | periodcentered | 226 | 342 | e2 | acircumflex | acircumflex | quotesinglbase | 227 | 343 | e3 | atilde | atilde | quotedblbase | 228 | 344 | e4 | adieresis | adieresis | perthousand | 229 | 345 | e5 | aring | aring | Acircumflex | 230 | 346 | e6 | ae | ae | Ecircumflex | 231 | 347 | e7 | ccedilla | ccedilla | Aacute | 232 | 350 | e8 | egrave | egrave | Edieresis | 233 | 351 | e9 | eacute | eacute | Egrave | 234 | 352 | ea | ecircumflex | ecircumflex | Iacute | 235 | 353 | eb | edieresis | edieresis | Icircumflex | 236 | 354 | ec | igrave | igrave | Idieresis | 237 | 355 | ed | iacute | iacute | Igrave | 238 | 356 | ee | icircumflex | icircumflex | Oacute | 239 | 357 | ef | idieresis | idieresis | Ocircumflex | 240 | 360 | f0 | eth | eth | --- | 241 | 361 | f1 | ntilde | ntilde | Ograve | 242 | 362 | f2 | ograve | ograve | Uacute | 243 | 363 | f3 | oacute | oacute | Ucircumflex | 244 | 364 | f4 | ocircumflex | ocircumflex | Ugrave | 245 | 365 | f5 | otilde | otilde | dotlessi | 246 | 366 | f6 | odieresis | odieresis | circumflex | 247 | 367 | f7 | divide | divide | tilde | 248 | 370 | f8 | oslash | oslash | macron | 249 | 371 | f9 | ugrave | ugrave | breve | 250 | 372 | fa | uacute | uacute | dotaccent | 251 | 373 | fb | ucircumflex | ucircumflex | ring | 252 | 374 | fc | udieresis | udieresis | cedilla | 253 | 375 | fd | yacute | yacute | hungarumlaut | 254 | 376 | fe | thorn | thorn | ogonek | 255 | 377 | ff | ydieresis | ydieresis | caron | Patrick -- ConTeXt wiki and more: http://contextgarden.net
Hi,
For glyphs to be included you mean? At least everything in /AdobeStandard, MacRoman Encoding and the Windows 1252 codepage that's not in /AdamsDenseEncoding already.
OK, here is the list: Omegagreek acute ampersand apple approxequal asciicircum asciitilde asterisk backslash braceleft braceright breve brokenbar bullet caron cedilla cent circumflex copyright currency dagger daggerdbl degree dieresis divide dollar dotaccent ellipsis euro florin fraction grave greater greaterequal guilsinglleft guilsinglright hungarumlaut increment infinity integral less lessequal logicalnot lozenge mu multiply nonbreakingspace notequal numbersign ogonek onehalf onequarter onesuperior ordfeminine ordmasculine overscore paragraph partialdiff percent periodcentered perthousand pi plus plusminus product quotedbl quotesingle radical registered ring section softhyphen space sterling summation threequarters threesuperior tilde trademark twosuperior underscore verticalbar yen (macroman, latin1, win1252 - I've taken the encodings from ftp.unicode.org and converted them to dvips compatible enc files) Patrick -- ConTeXt wiki and more: http://contextgarden.net
For glyphs to be included you mean? At least everything in /AdobeStandard, MacRoman Encoding and the Windows 1252 codepage that's not in /AdamsDenseEncoding already.
And now with ase instead of latin1. Sorry for the noise (too late now). Omegagreek acute ampersand apple approxequal asciicircum asciitilde asterisk backslash bar braceleft braceright breve brokenbar bullet caron cedilla cent circumflex copyright currency dagger daggerdbl degree dieresis divide dollar dotaccent ellipsis euro florin fraction grave greater greaterequal guilsinglleft guilsinglright hungarumlaut increment infinity integral less lessequal logicalnot lozenge macron mu multiply nonbreakingspace notequal numbersign ogonek onehalf onequarter onesuperior ordfeminine ordmasculine overscore paragraph partialdiff percent periodcentered perthousand pi plus plusminus product quotedbl quotesingle radical registered ring section softhyphen space sterling summation threequarters threesuperior tilde trademark twosuperior underscore verticalbar yen Patrick -- ConTeXt wiki and more: http://contextgarden.net
Taco Hoekwater wrote:
In that case, I propose a completely new companion encoding with a name like "asciibats", that has all of the visible ascii glyphs in their normal spots, and the other text-like symbols in the upper range.
indeed, that would also solve the url and filename problem. i like the name asciibats -) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hi,
Patrick, thanks for the diff.
No problem, just another very quick hack.
"guilsinglleft", "guilsinglright",
why are these out when their "double" companions are still in?
As Hans pointed out to me, the single ones aren't really used typographically, as far as ConTeXt is concerned.
you mean those < and >? They are used (e.g. Switzerland, France?). And I think that those are sensitive to correct kerning with other letters.
"sterling",
to be replaced by "euro" ;-)
:))
"Germandbls"
That is the "SS" right? Is was needed to make \uppercase{stra/e} work, but I doubt that is correct (because it really isn't a single character)
That was my view on it.
\uppercase{ß} should be SS (or SZ, but the former is preferred). No doubt about that. But I am not that much of a TeXnican to say that this glyph is necessary in the fontencoding. BTW: I think that 'space' is used in metapost, when not typesetting with btex...etex. So we might want to leave it there or the user should use a different encoding. (Relevant? I don't know.) Patrick -- ConTeXt wiki and more: http://contextgarden.net
Patrick Gundlach wrote:
\uppercase{ß} should be SS (or SZ, but the former is preferred). No doubt about that. But I am not that much of a TeXnican to say that this glyph is necessary in the fontencoding.
It is interesting that \lowercase{SS} is not (always) ß. Anyway, I think the \SS is a bit of sillyness introduced because EC had too much german influences ;-)
BTW: I think that 'space' is used in metapost, when not typesetting with btex...etex. So we might want to leave it there or the user should use a different encoding. (Relevant? I don't know.)
It is, but metapost will never default to this font without extra macro programming. (personally I prefer to, and encourage, the use of AdobeStandardEncoding in metapost labels that do not get passed through TeX). Taco
It is, but metapost will never default to this font without extra macro programming.
Not even in \startMPenvironment[+] ... \stopMPenvironment (or whatever this is called in ConTeXt?) Patrick -- ConTeXt wiki and more: http://contextgarden.net
Patrick Gundlach wrote:
It is, but metapost will never default to this font without extra macro programming.
Not even in
\startMPenvironment[+]
... \stopMPenvironment
(or whatever this is called in ConTeXt?)
probably possible, but that constitutes "extra macro programming" in my book ;-) Taco
It is, but metapost will never default to this font without extra macro programming. Not even in \startMPenvironment[+] ... \stopMPenvironment (or whatever this is called in ConTeXt?)
probably possible, but that constitutes "extra macro programming" in my book ;-)
Well, right, of course :), but it seems to me that a font in this new encoding would actually be used by context *and* inline-metafun. Unless, of course, the magic in \startMPenvironment changes. So, if we provide a fontencoding that will be used by metafun, at least the space should look right. Or, \startMPenvironment should choose a different encoding for (ConTeXt-)typesetting and metafun. But how would the rules of this decision look like? Currently (iirc) I can say \startMPenvironment[+] \setupencoding[default=newencoding] \switchtobodyfont[patrick] \stopMPenvironment Text should now be in newencoding and metafun stuff in ???. Patrick (unless I am totally wrong) -- ConTeXt wiki and more: http://contextgarden.net
Patrick Gundlach wrote:
It is, but metapost will never default to this font without extra macro programming.
Not even in \startMPenvironment[+] ... \stopMPenvironment (or whatever this is called in ConTeXt?)
probably possible, but that constitutes "extra macro programming" in my book ;-)
Well, right, of course :), but it seems to me that a font in this new encoding would actually be used by context *and* inline-metafun.
Metafun labels should be (and probably are) typeset by ConTeXt, but anyway, it is probably prudent to keep the space at 32. Taco
Patrick Gundlach wrote:
Well, right, of course :), but it seems to me that a font in this new encoding would actually be used by context *and* inline-metafun. Unless, of course, the magic in \startMPenvironment changes. So, if we provide a fontencoding that will be used by metafun, at least the space should look right. Or, \startMPenvironment should choose a different encoding for (ConTeXt-)typesetting and metafun. But how would the rules of this decision look like?
Currently (iirc) I can say
\startMPenvironment[+] \setupencoding[default=newencoding] \switchtobodyfont[patrick] \stopMPenvironment
Text should now be in newencoding and metafun stuff in ???.
keep in mind that currently tex does not use a space either; and btex .. etex will therefore not suffer from lack of space Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Patrick Gundlach wrote:
BTW: I think that 'space' is used in metapost, when not typesetting with btex...etex. So we might want to leave it there or the user should use a different encoding. (Relevant? I don't know.)
in btex .. etex no space is used (tex makes glue of it) but in 'infont' it is used Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Adam Lindsay wrote:
Officially it is a "digraph with casing hint" or so, and not really part of the alphabet. But otoh, we are supposed to type "IJsland" instead of "Ijsland" (iceland), and it is an official character in unicode.
I asked Hans. He didn't want it, didn't use it, and thought it was ugly. Beauty over truth. :)
my error, i've never been to IJsland, but in summer eating IJsjes makes sense Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Taco Hoekwater wrote:
I've asked for the compound word marker to return, but: for proper URL names (almost) all of ascii is needed, including a number of the ones that were dropped already, like "percent", "plus" and "asciitilde".
unless we make an 'url and filename encoding -)
Adding all of those chars would revert the encoding back towards EC quite a bit, so maybe we'd better switch to texnansi for URLs anyway. That means cwm is not immediately needed, either
so the advised font encodings will be: texnansi, qx, t5, dense Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
And while we're at it, for the archives, I'll try to go through the (European Language) justifications for what's added. Patrick Gundlach said this at Fri, 19 Aug 2005 14:12:12 +0200:
"dcroat"
(correction of dbar)
"tcaron",
(correction of tquoteright)
"rcommaaccent", "kcommaaccent", "lcommaaccent", "ncommaaccent", "gcommaaccent"
Latvian
"amacron", "emacron", "imacron", "omacron"
Cornish & Latvian
"umacron"
Cornish, Lithuanian & Latvian
"edotaccent", "iogonek", "uogonek"
Lithuanian
"scommaaccent", "tcommaaccent"
(correction from tcedilla) Romanian
"wcircumflex", "ycircumflex", "ygrave"
Welsh
"cdotaccent", "gdotaccent", "hbar"
Maltese [fyi: Cornish wasn't really a criterion, but it benefitted from the letters added and came along for the ride!] My source for the details was http://www.eki.ee/letter/, but I had some of these sketched out before. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Adam T. Lindsay, Computing Dept. atl@comp.lancs.ac.uk Lancaster University, InfoLab21 +44(0)1524/510.514 Lancaster, LA1 4WA, UK Fax:+44(0)1524/510.492 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Adam Lindsay wrote:
Sure. (didn't send to main list 'cos it's rough, still) I notice in the context of this latest discussion that I use Eth as a stand-in for Dcroat, mostly because I know slightly more about Icelandic than I do about Croatian. Although these slots are precious, perhaps I shouldn't do that.
There are extremely few encodings (if any) which do use Dcroat (they don't have any strong TeX user group such as the Polish or the Czech one to rule the world). The two glyphs are the same, but the drawbacks are: - the hyphenation with Dcroat doesn't work - you perhaps cannot properly lowercase/uppercase words - copy/search in the resulting documents will be wrong Well, you probably know it better if it would hurt you to have Dcroat instead of Eth. If slots are expensive enough, you can still make some language-specific hacks (redefine the lccode of Eth to point to dcroat, fool the Acrobat about the unicode value of Eth ...). It would be nice to have Dcroat (it would be one of extremely rare encodings with proper support for South Slavic languages), but well -- NOT for every price. Esperanto needs some more glyphs (5 or 6 times two), but I think that there are too many and that the language is to rarely used to sacrifice the slots. I skimmed the list on the wikipedia (http://en.wikipedia.org/wiki/Alphabets_derived_from_the_Latin). http://en.wikipedia.org/wiki/U-breve Do you consider ubreve important enough to be added? It is used by Esperanto (which cannot be typeset with this set of characters anyway), in latin transcript of Belarusian (perhaps a good, but almost surely the only reason), in Unicode reference it is also stated that it is used for Latin, but I didn't find any other reference for it. We also use it in phonetics, but that's just another bad argument (on one hand it can be composed and on the other it doesn't make much difference if there are only 9 instead of 10 glyphs missing). Thanks for the encoding, Adam, it's great idea to mix two encodings together to get as magnificent results as possible! I hope it will be ready soon to be used in ConTeXt ;) Will afterwards another complementary encoding be made with as many "non-letter" glyphs as possible to be used together with this one? Mojca
Hi Mojca,
Will afterwards another complementary encoding be made with as many "non-letter" glyphs as possible to be used together with this one?
That is what I undestood. See the rest of this thread. Patrick -- ConTeXt wiki and more: http://contextgarden.net
Hi Mojca, Mojca Miklavec said this at Sat, 20 Aug 2005 17:01:04 +0200:
It would be nice to have Dcroat (it would be one of extremely rare encodings with proper support for South Slavic languages), but well -- NOT for every price.
There's a parallel discussion on tex-fonts on Tcedilla/Tcommaaccent that really made it clear to me that if we're going to do this "right", Dcroat *must* be in the dense encoding. Fortunately, there's a spot for it, and it gets top priority now.
Esperanto needs some more glyphs (5 or 6 times two), but I think that there are too many and that the language is to rarely used to sacrifice the slots.
I looked at Esperanto, but fairly early on decided that it had too high a price for too few rewards. I'm rearranging the layout now, and should have another prototype on the list soon... -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Adam T. Lindsay, Computing Dept. atl@comp.lancs.ac.uk Lancaster University, InfoLab21 +44(0)1524/510.514 Lancaster, LA1 4WA, UK Fax:+44(0)1524/510.492 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Adam Lindsay said this at Sat, 20 Aug 2005 16:22:25 +0100:
I'm rearranging the layout now, and should have another prototype on the list soon...
Okay, my daughter stayed asleep for long enough for me to get this out... http://homepage.mac.com/atl/tex/dense2.pdf From the attached file: This is a new (2005) encoding optimised for text usage in a Unicode world. It eschews all accents for as many fully formed glyphs as possible. It sets aside all punctuation that is: 1) not part of a typical text flow, 2) normally an escaped TeX character, 3) mathematical or monetary/trade in nature, or 4) not typically governed by ligatures or kerns. The primary goal is to gain a complete set of latin letters for as many European languages as possible. It adds Latvian, Lithuanian, Romanian, Maltese, Cornish, and Welsh to the linguistic repertoire covered by the EC encoding. It also improves Balkan language support by correctly distinguishing /Eth from /Dcroat. This encoding will have obvious problems on most old, 256-character fonts. It is intended for `wide' Unicode fonts, such as Gentium, Minion, Myriad, modern Lucida implementations (such as Lucida Grande), and Latin Modern. Comments (especially on layout) welcome! Taco, if you need the /cwm in there, what would you sacrifice? -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Adam T. Lindsay, Computing Dept. atl@comp.lancs.ac.uk Lancaster University, InfoLab21 +44(0)1524/510.514 Lancaster, LA1 4WA, UK Fax:+44(0)1524/510.492 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
participants (5)
-
Adam Lindsay
-
Hans Hagen
-
Mojca Miklavec
-
Patrick Gundlach
-
Taco Hoekwater