ec encoding and tcaron

older
Antwort: Re: [NTG-context] Image...

Vit Zyka

13 Aug 2005 13 Aug '05

6:44 p.m.

Hi all, let allowed me some pessimism. I am working on our 40-years-scout-group-bulletin intensively for more then a month. I have to solve many many technical problems instead on focusing on the design. The Bulletin is rather complex but many problems are 'simple'. I start to doubt if ConTeXt (and that is why TeX generally) is good tool for such typsetting. Detective debugging is good fun but if time is going, no results, and the list of todo technical unwanted features is increasing... Only some of these 'bugs' I presented in this list. Even such number would be looking as making only troubles. Never mind, here is another problem I am puzzled by for 4 hours... Please imagine a simple code in ISO-8859-2: -------------------------------------- \enableregime[il2] %\enableregime[latin2] \usetypescript[modern][ec] \setupbodyfont[10pt,rm] \starttext ťŤ % \tcaron\Tcaron \WORD{ťŤ} \stoptext --------------------------------------- It produces an error ! Undefined control sequence. <to be read again> ¢ It comes from the first letter inside \WORD{...}. The same letters outside \WORD are typeset OK. So ť is not defined. But where it should be defined? \tcacon def is OK. Other diacritics chars e.g. \dcaron, \rcaron are OK. I was looking to enco-ec, enco-il2, regi-lat. Upper/lower mapping and char codes seems to be OK. But more I looking into I less understand it. There is no regi-il2 file, so I expect that appropriate info goes from enco-il2. But any other (perhaps more appropriate) combination like \enableregime[latin2] \usetypescript[modern][ec] or \usetypescript[modern][il2] gives totaly wrong glyphs. I believe that ť uppecase bug also relates with ignoring making pseudo-caps for this letter by texfont --fontroot=X: --en=ec --ve=public --co=lm --source=auto --ca=0.8 lmbx10 So problem is somewhere in ec encoding. Can somebody help please? Thanks and sorry for my embittering - it aims to my own head. vit

Show replies by date

Hans Hagen

13 Aug 13 Aug

10:35 p.m.

Vit Zyka wrote:

...

Hi all,

let allowed me some pessimism. I am working on our 40-years-scout-group-bulletin intensively for more then a month. I have to solve many many technical problems instead on focusing on the design. The Bulletin is rather complex but many problems are 'simple'. I start to doubt if ConTeXt (and that is why TeX generally) is good tool for such typsetting. Detective debugging is good fun but if time is going, no results, and the list of todo technical unwanted features is increasing...

Only some of these 'bugs' I presented in this list. Even such number would be looking as making only troubles. Never mind, here is another problem I am puzzled by for 4 hours...

Please imagine a simple code in ISO-8859-2: -------------------------------------- \enableregime[il2] %\enableregime[latin2]

\usetypescript[modern][ec] \setupbodyfont[10pt,rm]

\starttext ťŤ % \tcaron\Tcaron \WORD{ťŤ}

i see utf8 here -)

...

\stoptext --------------------------------------- It produces an error ! Undefined control sequence. <to be read again> ¢ It comes from the first letter inside \WORD{...}. The same letters outside \WORD are typeset OK. So ť is not defined. But where it should be defined? \tcacon def is OK. Other diacritics chars e.g. \dcaron, \rcaron are OK.

I was looking to enco-ec, enco-il2, regi-lat. Upper/lower mapping and char codes seems to be OK. But more I looking into I less understand it. There is no regi-il2 file, so I expect that appropriate info goes from enco-il2. But any other (perhaps more appropriate) combination like \enableregime[latin2] \usetypescript[modern][ec] or \usetypescript[modern][il2] gives totaly wrong glyphs.

I believe that ť uppecase bug also relates with ignoring making pseudo-caps for this letter by texfont --fontroot=X: --en=ec --ve=public --co=lm --source=auto --ca=0.8 lmbx10 So problem is somewhere in ec encoding.

Can somebody help please? Thanks and sorry for my embittering - it aims to my own head.

firts of all, i need zipped test files, since mailes mess around with encodings. next, can you try the alpha release, since it has some fixes (much of the encodings were not complete in the sense of lc/uc mappings); also mojca made teh latin encodings more complete don't worry, it should work ok, part of the problem is that il2 encoding (font encoding) is rather useless and incomplete but it happened to be the prefered one for czech cum suis (mostly computer modern related) and crossing language borders was not part of the game; (the same is true for pl0 encoding for polish and the polish computer modern; but qx encoding is supposed to handle both polisg and czech etc ok) it's actually even more messy when one looks into hyphenation, since most patterns are ec bases (czech patterns also can handle il2), which is why context now ships with generic patterns that can be used in other font encodings as well; regimes are just part of the input game, the active chars expand to names glyphs that themselves expand to characters; so, if something does not work as you expect, well we should make it work. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Vit Zyka

14 Aug 14 Aug

11:02 a.m.

Hans Hagen wrote:

...

Vit Zyka wrote:

...
Hi all,

let allowed me some pessimism. I am working on our 40-years-scout-group-bulletin intensively for more then a month. I have to solve many many technical problems instead on focusing on the design. The Bulletin is rather complex but many problems are 'simple'. I start to doubt if ConTeXt (and that is why TeX generally) is good tool for such typsetting. Detective debugging is good fun but if time is going, no results, and the list of todo technical unwanted features is increasing...

Only some of these 'bugs' I presented in this list. Even such number would be looking as making only troubles. Never mind, here is another problem I am puzzled by for 4 hours...

Please imagine a simple code in ISO-8859-2: -------------------------------------- \enableregime[il2] %\enableregime[latin2]

\usetypescript[modern][ec] \setupbodyfont[10pt,rm]

\starttext ťŤ % \tcaron\Tcaron \WORD{ťŤ}

i see utf8 here -)

Yes, my e-mail client work in utf8. For reading good solution, next I attach zip for testing.

...

...
\stoptext --------------------------------------- It produces an error ! Undefined control sequence. <to be read again> ¢ It comes from the first letter inside \WORD{...}. The same letters outside \WORD are typeset OK. So ť is not defined. But where it should be defined? \tcacon def is OK. Other diacritics chars e.g. \dcaron, \rcaron are OK.

I was looking to enco-ec, enco-il2, regi-lat. Upper/lower mapping and char codes seems to be OK. But more I looking into I less understand it. There is no regi-il2 file, so I expect that appropriate info goes from enco-il2. But any other (perhaps more appropriate) combination like \enableregime[latin2] \usetypescript[modern][ec] or \usetypescript[modern][il2] gives totaly wrong glyphs.

I believe that ť uppecase bug also relates with ignoring making pseudo-caps for this letter by texfont --fontroot=X: --en=ec --ve=public --co=lm --source=auto --ca=0.8 lmbx10 So problem is somewhere in ec encoding.

Can somebody help please? Thanks and sorry for my embittering - it aims to my own head.

firts of all, i need zipped test files, since mailes mess around with encodings.

next, can you try the alpha release, since it has some fixes (much of the encodings were not complete in the sense of lc/uc mappings); also mojca made teh latin encodings more complete

I am using latest greatest alpha with new formats with newtexexec... (I got it in the beginning of the week for testing sorting.)

...

don't worry, it should work ok,

part of the problem is that il2 encoding (font encoding) is rather useless and incomplete but it happened to be the prefered one for czech cum suis (mostly computer modern related) and crossing language borders was not part of the game; (the same is true for pl0 encoding for polish and the polish computer modern; but qx encoding is supposed to handle both polisg and czech etc ok)

I am using ec font encoding, il2 only as input encoding (regime). I can use any other in xemacs, but il2 is from historical reason. Never mind, problem is the same with utf8 input encoding, see attachment. I beleive the problem is not in il2 but somewhere in ec. Clue: Where texfont gives info about appercase letters for -ca switch (pseudo small caps)???

...

it's actually even more messy when one looks into hyphenation, since most patterns are ec bases (czech patterns also can handle il2), which is why context now ships with generic patterns that can be used in other font encodings as well;

regimes are just part of the input game, the active chars expand to names glyphs that themselves expand to characters; so, if something does not work as you expect, well we should make it work.

This theory is clear for me. Until this weekend I though it is clear even practically. But now my char code travelling mechanism is broken :-( Moment, the idea that comes from Moica's example .... yes! Line \enableregime[utf] is enough, but in case of il2 you shoud type \input regi-lat \enableregime[latin2] How people recognize that some regimes are preloaded and some not? What about some error/warning message if using not loaded regime? So regime starts to be clear again. The only question is: Where ConTeXt gives il2 mapping info if one types \enableregime[il2]??? (There is no 'il2' string in regi-*.tex; is it from enco-il2.tex?) --- Please look in the attachment for the \tcaron problem. vit

Hans Hagen

13 Aug 13 Aug

11:31 p.m.

Vit Zyka wrote:

...

let allowed me some pessimism. I am working on our 40-years-scout-group-bulletin intensively for more then a month. I have to solve many many technical problems instead on focusing on the design. The Bulletin is rather complex but many problems are 'simple'. I start to doubt if ConTeXt (and that is why TeX generally) is good tool for such typsetting. Detective debugging is good fun but if time is going, no results, and the list of todo technical unwanted features is increasing...

(btw, magazines can be doen quite well with columnsets) Here is a patched WORD (core-fnt): \chardef\uppercasemode\plusone % 0=ignore 1=normal 2=expand \unexpanded\def\WORD#1% {\bgroup \the\everyuppercase \let\smallcapped\firstofoneargument \let\WORD\firstofoneargument \let\dochar\rawcharacter \ifcase\uppercasemode #1% \or % No expansion here, otherwise \getvalue problems! Default!!! %\edef\next{#1}% keep this to prevent roll back %\uppercase\expandafter{\next}% keep this to prevent roll back \uppercase{#1}% \or \expanded{\uppercase{#1}}% needed when in utf8 \fi \egroup} And here a patched \definecharacter (enco-ini) \def\numcharacter#1{\char#1 } \let\dochar\numcharacter \def\definecharacter#1 #2 % {\ifundefined{#1}\setvalue{#1}{\dohandlecharacter{#1}}\fi \doifnumberelse{\string#2} {\setevalue{\characterprefix\characterencoding\string#1}% {\dochar{#2}}% \doautosetregime{#1}{#2}} {\setvalue{\characterprefix\characterencoding\string#1}{#2}}} % goes on top of enco-utf \prependtoks \doif\currentregime{utf}{\chardef\uppercasemode\plustwo}% \to\everyuppercase % \input enco-ini-new.tex % \startmapping [ec] % \defineuppercasecom \something \nothing % \stopmapping \input enco-ec.tex % needed when no new format \starttext \enableregime[utf] \usetypescript[modern][ec] \setupbodyfont[10pt,rm] Å¥ Å¤ \ccaron \WORD{Å¥ Å¤ \ccaron} \stoptext I didn't test this with latin input; the trick is to let the named glyphs expand to a raw character which then can be uppercased by tex. quite dirty. It is dangerous to do this always because in the case of written/reread data we cannot output raw characters since they would eb regimes again (this time in the wrong way). Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Vit Zyka

14 Aug 14 Aug

11:48 a.m.

Hans Hagen wrote:

...

Vit Zyka wrote:

...
let allowed me some pessimism. I am working on our 40-years-scout-group-bulletin intensively for more then a month. I have to solve many many technical problems instead on focusing on the design. The Bulletin is rather complex but many problems are 'simple'. I start to doubt if ConTeXt (and that is why TeX generally) is good tool for such typsetting. Detective debugging is good fun but if time is going, no results, and the list of todo technical unwanted features is increasing...

(btw, magazines can be doen quite well with columnsets) Here is a patched WORD (core-fnt):

\chardef\uppercasemode\plusone % 0=ignore 1=normal 2=expand

\unexpanded\def\WORD#1% {\bgroup \the\everyuppercase \let\smallcapped\firstofoneargument \let\WORD\firstofoneargument \let\dochar\rawcharacter \ifcase\uppercasemode #1% \or % No expansion here, otherwise \getvalue problems! Default!!! %\edef\next{#1}% keep this to prevent roll back %\uppercase\expandafter{\next}% keep this to prevent roll back \uppercase{#1}% \or \expanded{\uppercase{#1}}% needed when in utf8 \fi \egroup}

And here a patched \definecharacter (enco-ini)

\def\numcharacter#1{\char#1 } \let\dochar\numcharacter

\def\definecharacter#1 #2 % {\ifundefined{#1}\setvalue{#1}{\dohandlecharacter{#1}}\fi \doifnumberelse{\string#2} {\setevalue{\characterprefix\characterencoding\string#1}% {\dochar{#2}}% \doautosetregime{#1}{#2}} {\setvalue{\characterprefix\characterencoding\string#1}{#2}}}

% goes on top of enco-utf \prependtoks \doif\currentregime{utf}{\chardef\uppercasemode\plustwo}% \to\everyuppercase

% \input enco-ini-new.tex

% \startmapping [ec] % \defineuppercasecom \something \nothing % \stopmapping

\input enco-ec.tex % needed when no new format \starttext

\enableregime[utf] \usetypescript[modern][ec] \setupbodyfont[10pt,rm]

Å¥ Å¤ \ccaron

\WORD{Å¥ Å¤ \ccaron}

\stoptext

Yes, Hans, it works in utf8. In case of il2 it works too if expansion is also done: \prependtoks \doif\currentregime{utf}{\chardef\uppercasemode\plustwo}% \doif\currentregime{il2}{\chardef\uppercasemode\plustwo}% \to\everyuppercase --- Actually I discovered the source of the problem with \tcaron! There exists enco-ecm.tex file with some exceptions. And there is \definecharacter tcaron {\buildtextaccent\textcaron t} If I comment this line, expansion is not needed. I suggest to omit it since \tcaron is now present in lm. But \WORD I do not use (it was only product of my debugging) I use pseudo caps and there the problem preserves. Files attached and texfont --fontroot=X: --en=ec --ve=public --co=lm --source=auto --ca=0.8 lmbx10

...

I didn't test this with latin input; the trick is to let the named glyphs expand to a raw character which then can be uppercased by tex. quite dirty. It is dangerous to do this always because in the case of written/reread data we cannot output raw characters since they would eb regimes again (this time in the wrong way).

Hans, did you think about Petr Olsak's enc-tex? I believe it is much straight-forward solution that solve input enco, output to files and output to log (utf8 works too). And it should be much quicker than macros. I am not sure, but perhaps there is no patch to aleph now... vit

Mojca Miklavec

12:12 p.m.

Vit Zyka wrote:

...

Actually I discovered the source of the problem with \tcaron! There exists enco-ecm.tex file with some exceptions. And there is

\definecharacter tcaron {\buildtextaccent\textcaron t}

If I comment this line, expansion is not needed. I suggest to omit it since \tcaron is now present in lm.

If and only if you work with lm & ec. Otherwise building of accents is quite useful. (How can I get ec in Antiqwa?)

...

But \WORD I do not use (it was only product of my debugging) I use pseudo caps and there the problem preserves. Files attached and texfont --fontroot=X: --en=ec --ve=public --co=lm --source=auto --ca=0.8 lmbx10

Another *extremely* strange observation. In a document with ec encoding and some accented characters, searching for 'č' simply doesn't work. I don't understand why. I know very little about PDF, but in the resulting document there was this line present: /CharSet (/breve/one/D/U/Y/u/Ccaron/Scaron/Tcaron/Zcaron/ccaron/tcaron) with more or less only the characters I used. The line seems to be OK, ccaron seems to be present. Searching for 'š' works as expected (even lower/uppercase is recognised), but at the place of 'č' only c is recognised (if I copy-pase, only c remains at that place). I thought that it was only Acrobat's fault, but searching for the same letter in another document worked OK (documentation for Antiqwa for example). Minimal example: \usetypescript[modern][ec] \setupbodyfont[10pt,rm] \starttext \ccaron\scaron \stoptext and then either searching or converting to plain text. Mojca

Vit Zyka

4:59 p.m.

Mojca Miklavec wrote:

...

Vit Zyka wrote:

...
Actually I discovered the source of the problem with \tcaron! There exists enco-ecm.tex file with some exceptions. And there is

\definecharacter tcaron {\buildtextaccent\textcaron t}

If I comment this line, expansion is not needed. I suggest to omit it since \tcaron is now present in lm.

If and only if you work with lm & ec. Otherwise building of accents is quite useful. (How can I get ec in Antiqwa?)

...
But \WORD I do not use (it was only product of my debugging) I use pseudo caps and there the problem preserves. Files attached and texfont --fontroot=X: --en=ec --ve=public --co=lm --source=auto --ca=0.8 lmbx10

Another *extremely* strange observation. In a document with ec encoding and some accented characters, searching for 'č' simply doesn't work. I don't understand why. I know very little about PDF, but in the resulting document there was this line present:

/CharSet (/breve/one/D/U/Y/u/Ccaron/Scaron/Tcaron/Zcaron/ccaron/tcaron)

with more or less only the characters I used. The line seems to be OK, ccaron seems to be present. Searching for 'š' works as expected (even lower/uppercase is recognised), but at the place of 'č' only c is recognised (if I copy-pase, only c remains at that place). I thought that it was only Acrobat's fault, but searching for the same letter in another document worked OK (documentation for Antiqwa for example).

Minimal example:

\usetypescript[modern][ec] \setupbodyfont[10pt,rm] \starttext \ccaron\scaron \stoptext

Just comment: There should be CMAP resouce in the PDF for that maps font encoding to unicode. Then search and copying works. ConTeXt supports CMAP but IFAIK only il2 encoding! CMAP resources for the rest encoding are missing. Just see enco-pfr.tex vit

Taco Hoekwater

5:12 p.m.

Vit Zyka wrote:

...

Just comment:

There should be CMAP resouce in the PDF for that maps font encoding to unicode. Then search and copying works. ConTeXt supports CMAP but IFAIK only il2 encoding! CMAP resources for the rest encoding are missing. Just see enco-pfr.tex

From the Release notes: Context 2005.07.27 This version of ConTeXt was uploaded on July 28, 2005. * ..... * optionally make ec-encoded pdfs better searchable in Acrobat5 and lower See also this thread, only a few weeks ago: http://archive.contextgarden.net/message/20050727.073731.41799d96.html Cheers, Taco

Mojca Miklavec

6:18 p.m.

Taco Hoekwater wrote:

...

Vit Zyka wrote:

...
Just comment:

There should be CMAP resouce in the PDF for that maps font encoding to unicode. Then search and copying works. ConTeXt supports CMAP but IFAIK only il2 encoding! CMAP resources for the rest encoding are missing. Just see enco-pfr.tex

From the Release notes:

Context 2005.07.27

This version of ConTeXt was uploaded on July 28, 2005.

* ..... * optionally make ec-encoded pdfs better searchable in Acrobat5 and lower

See also this thread, only a few weeks ago:

http://archive.contextgarden.net/message/20050727.073731.41799d96.html

Cheers, Taco

Thank you both! I remember the thread, but at that time searching for letters in ligatures worked for me and I have never noticed that anything was wrong with texnansi until a week ago :) as Hans's tricks made the letters look all-right. So I didn't understand the point of that hack. I added the comment about pfr to the Wiki (encodings and regimes). (Would it be very expensive to add this feature to the ec encoding automatically?) Thanks, Mojca

Hans Hagen

7:19 p.m.

Mojca Miklavec wrote:

...

(Would it be very expensive to add this feature to the ec encoding automatically?)

no, quite cheap; i'll cut you a deal ... make me the texnansi and qx variants and i'll enable all of it automatically Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Hans Hagen

15 Aug 15 Aug

12:51 p.m.

Mojca Miklavec wrote:

...

(Would it be very expensive to add this feature to the ec encoding automatically?)

there is a problem (1) loading needs to be carefully synchronized (2) we don't want dummy vectors when we have a no-text document so we may end up with manual activation Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Hans Hagen

14 Aug 14 Aug

9:12 p.m.

Vit Zyka wrote:

...

Hans, did you think about Petr Olsak's enc-tex? I believe it is much straight-forward solution that solve input enco, output to files and output to log (utf8 works too). And it should be much quicker than macros. I am not sure, but perhaps there is no patch to aleph now...

i took a look at it when it came around but it's not flexible enough; it deals with the input and part of the output; i asked the author to consider some extensions that would it make possible to deal with multiple output variants (keep in mind that some input data may need to be fed into pdf specific data structures, in different encodings, with different kind of escapes etc) and that in for instance buffers, verbatim, multi-pass data and other situations one may need the original input; it all depends on how a macro package is build and deals with data and also with the complexity of the applications. Live is simply not simple. What works for latex or plain tex may not work for context and vise versa. So, when i noticed the 'just use it and don't ask' attitude i gave up on enctex or at least decided to postpone support till i ran into a Of course anyone is free to overload the relevant pieces of the regime handler. I suppose that putting some enctex mapper in front of it works ok (i.e. as long as one either expands to raw characters in the current encoding or to named glyphs. now back to the issue: I uploaded a new version (while crossing my fingers); the problem with this uppercasing is that handling (font) characters is one thing, handling inputs another oif not to speak of commands, and other things that can end up inside WORD and then be subjected to uppercasing \def\Vit{Vit} \WORD{\Vit} -> VIT \def\Vit{whatever} \WORD{\Vit} -> WHATEVER \startencoding[ec] \defineuppercasecom \Vit {nothing} \stopencoding \def\Vit{whatever} \WORD{\Vit} -> nothing so, there is much more involved; i change a few things (i fear that taco is the only one who now understand what's going on but you may give it a try -) (\chardef\uppercasemode\plusone gives the old behaviour) The problem with this kind of things is that it will not become esier to solve when we have a different tex or so ... there are simply too many variants / usages / ...) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Vit Zyka

11:58 p.m.

Hans Hagen wrote:

...

Vit Zyka wrote:

...
Hans, did you think about Petr Olsak's enc-tex? I believe it is much straight-forward solution that solve input enco, output to files and output to log (utf8 works too). And it should be much quicker than macros. I am not sure, but perhaps there is no patch to aleph now...

i took a look at it when it came around but it's not flexible enough; it deals with the input and part of the output; i asked the author to consider some extensions that would it make possible to deal with multiple output variants (keep in mind that some input data may need to be fed into pdf specific data structures, in different encodings, with different kind of escapes etc) and that in for instance buffers, verbatim, multi-pass data and other situations one may need the original input; it all depends on how a macro package is build and deals with data and also with the complexity of the applications. Live is simply not simple. What works for latex or plain tex may not work for context and vise versa. So, when i noticed the 'just use it and don't ask' attitude i gave up on

I think I undestand...

...

enctex or at least decided to postpone support till i ran into a Of course anyone is free to overload the relevant pieces of the regime handler. I suppose that putting some enctex mapper in front of it works ok (i.e. as long as one either expands to raw characters in the current encoding or to named glyphs. now back to the issue: I uploaded a new version (while crossing my fingers); the problem with this uppercasing is that handling (font) characters is one thing, handling inputs another oif not to speak of commands, and other things that can end up inside WORD and then be subjected to uppercasing \def\Vit{Vit} \WORD{\Vit} -> VIT \def\Vit{whatever} \WORD{\Vit} -> WHATEVER

\startencoding[ec] \defineuppercasecom \Vit {nothing} \stopencoding

\def\Vit{whatever} \WORD{\Vit} -> nothing so, there is much more involved; i change a few things (i fear that taco is the only one who now understand what's going on but you may give it a try -) (\chardef\uppercasemode\plusone gives the old behaviour)

The problem with this kind of things is that it will not become esier to solve when we have a different tex or so ... there are simply too many variants / usages / ...)

All right, thanks Hans, I will try new alpha, but what about texfont-pseudocaps-problem? Is it relevat to this one? I turn to thing it is different and much simpler problem... vit

Hans Hagen

15 Aug 15 Aug

1:07 a.m.

Vit Zyka wrote:

...

All right, thanks Hans, I will try new alpha, but what about texfont-pseudocaps-problem? Is it relevat to this one? I turn to thing it is different and much simpler problem...

i think that they are not loaded at all, adam may know (in a few weeks i'll pick up a thread about this var stuff since we have some ideas on how to make in more convenient) anyhow, there is no need to redefine all mappings, sinc ethe existing typescripts will be taken as well, so: \starttypescript [all] [latin-modern] [texnansi,ec,qx,pl0,il2,t5] \definefontsynonym [cmcsci10] [\typescriptthree-lmri10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcscbx10] [\typescriptthree-lmbx10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcscbxi10] [\typescriptthree-lmbxi10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcscro10] [\typescriptthree-lmro10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcsctti10] [\typescriptthree-lmtti10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcsctto10] [\typescriptthree-lmtto10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcscss10] [\typescriptthree-lmss10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcscssbx10][\typescriptthree-lmssbx10-capitalized-800][encoding=\typescriptthree] \definefontsynonym [cmcscssi10] [\typescriptthree-lmsso10-capitalized-800] [encoding=\typescriptthree] \stoptypescript \starttypescript [serif] [latin-modern] [default] \definefontsynonym [SerifSmallCaps] [ComputerModern-Caps] \definefontsynonym [SerifItalicSmallCaps] [cmcsci10] \definefontsynonym [SerifBoldSmallCaps] [cmcscbx10] \definefontsynonym [SerifBoldItalicSmallCaps][cmcscbxi10] \definefontsynonym [SerifSlantedSmallCaps] [cmcscro10] \stoptypescript \starttypescript [sans] [latin-modern] [default] \definefontsynonym [SansSmallCaps] [cmcscss10] \definefontsynonym [SansItalicSmallCaps] [cmcscssi10] \definefontsynonym [SansBoldSmallCaps] [cmcscssbx10] \definefontsynonym [SansSlantedSmallCaps][cmcscssi10] \stoptypescript \starttypescript [mono] [latin-modern] [default] \definefontsynonym [MonoSmallCaps] [ComputerModernMono-Caps] \definefontsynonym [MonoItalicSmallCaps][cmcsctti10] \stoptypescript (you can now add these typescripts to the main file, no need for a separate typescript file) another approach is: \starttypescript [all] [latin-modern-sc] [texnansi,ec,qx,pl0,il2,t5] \definefontsynonym [cmcsci10] [\typescriptthree-lmri10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcscbx10] [\typescriptthree-lmbx10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcscbxi10] [\typescriptthree-lmbxi10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcscro10] [\typescriptthree-lmro10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcsctti10] [\typescriptthree-lmtti10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcsctto10] [\typescriptthree-lmtto10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcscss10] [\typescriptthree-lmss10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcscssbx10][\typescriptthree-lmssbx10-capitalized-800][encoding=\typescriptthree] \definefontsynonym [cmcscssi10] [\typescriptthree-lmsso10-capitalized-800] [encoding=\typescriptthree] \stoptypescript \starttypescript [serif] [latin-modern-sc] [name] \definefontsynonym [Serif] [ComputerModern-Caps] \definefontsynonym [SerifItalic] [cmcsci10] \definefontsynonym [SerifBold] [cmcscbx10] \definefontsynonym [SerifBoldItalic][cmcscbxi10] \definefontsynonym [SerifSlanted] [cmcscro10] \stoptypescript \starttypescript [sans] [latin-modern-sc] [name] \definefontsynonym [Sans] [cmcscss10] \definefontsynonym [SansItalic] [cmcscssi10] \definefontsynonym [SansBold] [cmcscssbx10] \definefontsynonym [SansSlanted][cmcscssi10] \stoptypescript \starttypescript [mono] [latin-modern-sc] [name] \definefontsynonym [Mono] [ComputerModernMono-Caps] \definefontsynonym [MonoItalic] [cmcsctti10] \stoptypescript \starttypescript[map][latin-modern-sc][all] ... \stoptypescript \definetypeface[modernsc][rm][serif][latin-modern-sc][default] ..... {\modernsc test} ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Vit Zyka

11:39 a.m.

Hans Hagen wrote:

...

Vit Zyka wrote:

...
All right, thanks Hans, I will try new alpha, but what about texfont-pseudocaps-problem? Is it relevat to this one? I turn to thing it is different and much simpler problem...

i think that they are not loaded at all, adam may know (in a few weeks i'll pick up a thread about this var stuff since we have some ideas on how to make in more convenient)

anyhow, there is no need to redefine all mappings, sinc ethe existing typescripts will be taken as well, so:

Dear Hans, I do not know if my descriptions is so fuzzy or we are both a bit overworked ;-) I send to the conference zipped minimal example and typescripts for pseudo-small caps yesterday. They are based on font variants and work perfectly. The ONLY PROBLEM (demonstrated in the minimal example) is that texfont ignore tcaron when creating virtual font. Every other similar accented letters (e.g. \dcaron) is OK. And since I can not find it in the texfont source I ask again: WHERE/HOW texfont gives info about corresponding (lower->uppercase) char pairs? vit

...

\starttypescript [all] [latin-modern] [texnansi,ec,qx,pl0,il2,t5] \definefontsynonym [cmcsci10] [\typescriptthree-lmri10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcscbx10] [\typescriptthree-lmbx10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcscbxi10] [\typescriptthree-lmbxi10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcscro10] [\typescriptthree-lmro10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcsctti10] [\typescriptthree-lmtti10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcsctto10] [\typescriptthree-lmtto10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcscss10] [\typescriptthree-lmss10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcscssbx10][\typescriptthree-lmssbx10-capitalized-800][encoding=\typescriptthree]

\definefontsynonym [cmcscssi10] [\typescriptthree-lmsso10-capitalized-800] [encoding=\typescriptthree] \stoptypescript

\starttypescript [serif] [latin-modern] [default] \definefontsynonym [SerifSmallCaps] [ComputerModern-Caps] \definefontsynonym [SerifItalicSmallCaps] [cmcsci10] \definefontsynonym [SerifBoldSmallCaps] [cmcscbx10] \definefontsynonym [SerifBoldItalicSmallCaps][cmcscbxi10] \definefontsynonym [SerifSlantedSmallCaps] [cmcscro10] \stoptypescript

\starttypescript [sans] [latin-modern] [default] \definefontsynonym [SansSmallCaps] [cmcscss10] \definefontsynonym [SansItalicSmallCaps] [cmcscssi10] \definefontsynonym [SansBoldSmallCaps] [cmcscssbx10] \definefontsynonym [SansSlantedSmallCaps][cmcscssi10] \stoptypescript

\starttypescript [mono] [latin-modern] [default] \definefontsynonym [MonoSmallCaps] [ComputerModernMono-Caps] \definefontsynonym [MonoItalicSmallCaps][cmcsctti10] \stoptypescript (you can now add these typescripts to the main file, no need for a separate typescript file) another approach is: \starttypescript [all] [latin-modern-sc] [texnansi,ec,qx,pl0,il2,t5] \definefontsynonym [cmcsci10] [\typescriptthree-lmri10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcscbx10] [\typescriptthree-lmbx10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcscbxi10] [\typescriptthree-lmbxi10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcscro10] [\typescriptthree-lmro10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcsctti10] [\typescriptthree-lmtti10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcsctto10] [\typescriptthree-lmtto10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcscss10] [\typescriptthree-lmss10-capitalized-800] [encoding=\typescriptthree] \definefontsynonym [cmcscssbx10][\typescriptthree-lmssbx10-capitalized-800][encoding=\typescriptthree]

\definefontsynonym [cmcscssi10] [\typescriptthree-lmsso10-capitalized-800] [encoding=\typescriptthree] \stoptypescript

\starttypescript [serif] [latin-modern-sc] [name] \definefontsynonym [Serif] [ComputerModern-Caps] \definefontsynonym [SerifItalic] [cmcsci10] \definefontsynonym [SerifBold] [cmcscbx10] \definefontsynonym [SerifBoldItalic][cmcscbxi10] \definefontsynonym [SerifSlanted] [cmcscro10] \stoptypescript

\starttypescript [sans] [latin-modern-sc] [name] \definefontsynonym [Sans] [cmcscss10] \definefontsynonym [SansItalic] [cmcscssi10] \definefontsynonym [SansBold] [cmcscssbx10] \definefontsynonym [SansSlanted][cmcscssi10] \stoptypescript

\starttypescript [mono] [latin-modern-sc] [name] \definefontsynonym [Mono] [ComputerModernMono-Caps] \definefontsynonym [MonoItalic] [cmcsctti10] \stoptypescript

\starttypescript[map][latin-modern-sc][all] ... \stoptypescript

\definetypeface[modernsc][rm][serif][latin-modern-sc][default] ..... {\modernsc test}

----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

_______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context

-- ======================================================= Ing. Vít Zýka, Ph.D. TYPOkvítek database publishing databazove publikovani data maintaining and typesetting in typographic quality priprava dat a jejich sazba v typograficke kvalite tel.: (+420) 777 198 189 www: http://typokvitek.com =======================================================

Hans Hagen

1:09 p.m.

Vit Zyka wrote:

...

And since I can not find it in the texfont source I ask again: WHERE/HOW texfont gives info about corresponding (lower->uppercase) char pairs?

this not done by texfont, but by afmtotfm and i guess that it's kind of hard coded in there (taco or patrick may know) an alternative is to copy for instance ec.enc to ec-cap.enc and change the lowercase entries to uppercase ones (we can add such files to the distribution); then you use texfont with that encoding so you get ec-cap-whatever kind of font files Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Patrick Gundlach

2:17 p.m.

...

\usetypescript[modern][ec] \setupbodyfont[10pt,rm]

\starttext ťŤ % \tcaron\Tcaron \WORD{ťŤ}

In ec encoding there is no tcaron, Only tquoteright, which I guess should be the lowercase Tcaron. When faking smallcaps, afm2tfm uses the tolower() c function, that will fail the way afm2tfm uses it. So it assumes that the lowercase of Tcaron is tcaron, which is not present in ec enc and therefore does not get in the fontencoding. Did this help? I am just stepping into this thread, so I might have missed the whole point. Patrick -- ConTeXt wiki and more: http://contextgarden.net

Vit Zyka

11:10 p.m.

Patrick Gundlach wrote:

...

...
\usetypescript[modern][ec] \setupbodyfont[10pt,rm]

\starttext ťŤ % \tcaron\Tcaron \WORD{ťŤ}

In ec encoding there is no tcaron, Only tquoteright, which I guess should be the lowercase Tcaron. When faking smallcaps, afm2tfm uses the tolower() c function, that will fail the way afm2tfm uses it. So it assumes that the lowercase of Tcaron is tcaron, which is not present in ec enc and therefore does not get in the fontencoding.

Did this help? I am just stepping into this thread, so I might have missed the whole point.

No, Patrick, you hit the goal! But, after Hans last note, it was easy even for me ;-) It is some kind of inconsistency/bug in the ec encoding: if there is equivalent /dcaron present, then trere should be /tcaron too instead /trightquote. My problem is solved now, thanks to Moica, Hans, and Patrick. vit

Patrick Gundlach

11:21 p.m.

Hi Vit,

...

It is some kind of inconsistency/bug in the ec encoding: if there is equivalent /dcaron present, then trere should be /tcaron too instead /trightquote.

So, how do the fonts name the glyph? You can use tex256 encoding. See http://fun.contextgarden.net/encodingtable/enctable.rb?ec,tex256 Patrick -- ConTeXt wiki and more: http://contextgarden.net

Vit Zyka

16 Aug 16 Aug

10:51 a.m.

Patrick Gundlach wrote:

...

Hi Vit,

...
It is some kind of inconsistency/bug in the ec encoding: if there is equivalent /dcaron present, then trere should be /tcaron too instead /trightquote.

So, how do the fonts name the glyph?

looking to my font/afm directory: afm>grep -S -l "tcaron" *.afm | perl -n -e "$i=0;while(<>){$i++;}print\"$i\n\"" 761 afm>grep -S -l "tquoteright" *.afm | perl -n -e "$i=0;while(<>){$i++;}print\"$i\n\"" 69 but from these 69, 53 are from lm, and in lm these glyph is dubdled (both tcaron and tquoteright): lmr10.afm: C -1 ; WX 388.88889 ; N tquoteright ; B 19 -11 332 699 ; lmr10.afm: C -1 ; WX 388.88889 ; N tcaron ; B 19 -11 332 699 ; the rest 16 files are: 8x .\ibm\courier\*.afm and 8x .\ibm\times\*.afm even there are both glyphs but with different metrics grep -S "tcaron" cour.afm C -1 ; WX 600 ; N tcaron ; B 94 -14 538 720 ; CC tcaron 2 ; PCC t 0 0 ; PCC caron -77 92 ; grep -S "tquoteright" cour.afm C -1 ; WX 600 ; N tquoteright ; B 73 -14 646 563 ; So it seems that fonts are using /tcaron; lm use /tquoteright only for sure. Does anybody know who is responsibility/background for ec encoding? vit

Patrick Gundlach

10:57 a.m.

[trightquote/tcaron]

...

So it seems that fonts are using /tcaron; lm use /tquoteright only for sure.

ok, thanks

...

Does anybody know who is responsibility/background for ec encoding?

Last time I asked about the encodings (xt2.enc in particular) I got the following answer from Thomas Esser:

...

That file comes from the fontname distribution (CTAN:info/fontname). If you request for a change in that file, please write either to the tex-fonts list or to Karl Berry.

Patrick -- ConTeXt wiki and more: http://contextgarden.net

Hans Hagen

15 Aug 15 Aug

11:52 p.m.

Vit Zyka wrote:

...

No, Patrick, you hit the goal! But, after Hans last note, it was easy even for me ;-)

It is some kind of inconsistency/bug in the ec encoding: if there is equivalent /dcaron present, then trere should be /tcaron too instead /trightquote.

My problem is solved now, thanks to Moica, Hans, and Patrick.

well, since mojca want some additional glyphs as well, why not make an ecx.enc file that fixes these things; actually, you only need that file for generating metrics and rename the ecx-* files to ec-* afterwards; we can tweak texfotn to use a different encoding name and output name if needed. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

7272

Age (days ago)

7275

Last active (days ago)

List overview

Download

21 comments

5 participants

participants (5)

Hans Hagen
Mojca Miklavec
Patrick Gundlach
Taco Hoekwater
Vit Zyka

ec encoding and tcaron

tags

participants (5)