Thank you for your patience Jim.
Try this at a shell prompt:
env LANG=C LC_ALL=C cat --show-all FileName
where FileName is the file in question. The non-ascii characters will be output as strings that look M-? where ? is a single ascii character. If you see a single M-? triplet in place of each non-ascii character you do not have utf-8. If you see between two and five such triplets for each non-ascii character in the document it is probably utf-8. (If you see ^@ pairs separating the ascii chars you have utf-16.)
Okay, this gives me some comfort as it seems to confirm that I do have UTF-8 as I thought. I'm seeing twos, threes and fours of the triplets you describe, and no evidence of high-ascii single chars nor of ^@. So I'm pretty sure it is UTF-8. Thanks for this.
I've only tested on tetex-3. That may make a difference....
I think maybe it does. Is there anyone who is running the *minimal install* (from Hans' zip files) on either windows or linux who could test this for me? I just need you to try out a unicode accented character within an <mtext> element inside MathML. Here's my template again - put an unicode accented char where 'HERE' appears: \useXMLfilter[utf]\usemodule[mathml] \starttext\startXMLdata <formula><math><mtext>HERE</mtext></math></formula> \stopXMLdata\stoptext
You may want to give TeX-Live a test.
It's usually my first port of call, but AFAIK it's not possible to control the way the web browser re-encodes stuff before it is submitted, so the results are not reliable. This is a real shame - TeX-Live is how I usually confirm all my queries. Thanks again Jim; can anyone running the minimal install help me? Duncan
Duncan Hothersall wrote:
Thank you for your patience Jim.
Try this at a shell prompt:
env LANG=C LC_ALL=C cat --show-all FileName
where FileName is the file in question. The non-ascii characters will be output as strings that look M-? where ? is a single ascii character. If you see a single M-? triplet in place of each non-ascii character you do not have utf-8. If you see between two and five such triplets for each non-ascii character in the document it is probably utf-8. (If you see ^@ pairs separating the ascii chars you have utf-16.)
Okay, this gives me some comfort as it seems to confirm that I do have UTF-8 as I thought. I'm seeing twos, threes and fours of the triplets you describe, and no evidence of high-ascii single chars nor of ^@. So I'm pretty sure it is UTF-8. Thanks for this.
I've only tested on tetex-3. That may make a difference....
I think maybe it does. Is there anyone who is running the *minimal install* (from Hans' zip files) on either windows or linux who could test this for me? I just need you to try out a unicode accented character within an <mtext> element inside MathML. Here's my template again - put an unicode accented char where 'HERE' appears:
\useXMLfilter[utf]\usemodule[mathml] \starttext\startXMLdata <formula><math><mtext>HERE</mtext></math></formula> \stopXMLdata\stoptext
You may want to give TeX-Live a test.
It's usually my first port of call, but AFAIK it's not possible to control the way the web browser re-encodes stuff before it is submitted, so the results are not reliable. This is a real shame - TeX-Live is how I usually confirm all my queries.
Thanks again Jim; can anyone running the minimal install help me?
Duncan
I attached a small test file. Some trickery is needed to get utf working in mathml - the map patch goes into xtag-map.tex - the other one into xtag-mmp part of the problem is that the current font must provide the characters You're lucky that i hav eto run soem big boring files in the background -) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
"Duncan" == Duncan Hothersall
writes:
You may want to give TeX-Live a test.
Duncan> It's usually my first port of call, but AFAIK it's not Duncan> possible to control the way the web browser re-encodes stuff Duncan> before it is submitted, so the results are not reliable. Is there a collision in the name TeX Live? I was thinking of the CD/DVD sets. They have TeX for a wide range of systems. And you can run it directly from the CD for at least windows & linux/x86. (Hense Live.) -JimC
Duncan Hothersall wrote:
I think maybe it does. Is there anyone who is running the *minimal install* (from Hans' zip files) on either windows or linux who could test this for me? I just need you to try out a unicode accented character within an <mtext> element inside MathML. Here's my template again - put an unicode accented char where 'HERE' appears:
\useXMLfilter[utf]\usemodule[mathml] \starttext\startXMLdata <formula><math><mtext>HERE</mtext></math></formula> \stopXMLdata\stoptext
It's usually my first port of call, but AFAIK it's not possible to control the way the web browser re-encodes stuff before it is submitted, so the results are not reliable. This is a real shame - TeX-Live is how I usually confirm all my queries.
I'm not sure about it, but my experience is that if I ask for an UTF-8 encoded page (at least in Mozilla), my input is also submitted as UTF-8. (When I worked with phpMyAdmin for example, I had to use the same encoding as the database which was sometimes annoying - the pages had to be shown in the wrong encoding in order to be able to work with the data properly.) The rest of the contextgarden pages already has UTF-8 encoding, but I'm not sure if that wouldn't disturb some newbies forgetting to add \enableregime[utf] at the beginning of the document. I tried this out: \enableregime[utf] \useXMLfilter[utf]\usemodule[mathml] \starttext äöüčćšđž % rendered properly \startXMLdata <formula><math><mtext>äöüčćšđž</mtext></math></formula> \stopXMLdata\stoptext It doesn't work here either, but the problem doesn't seem to be in unicode, but in the rendering of accented characters. This example: \usemodule[mathml] \starttext\startXMLdata <formula><math><mtext>\"{a}\"{o}\"{u}\v{c}\v{s}\v{z}</mtext></math></formula> \stopXMLdata\stoptext fails as well. An interesting observation: I tested on live.contextgarden.com, on the latest ConTeXt in MikTeX distribution and in an old minimal ConTeXt distribution for Windows (6.12.2004). The results from MikTeX and live.contextgarden.net were equal. \v{c} resulted in something like "leftdoubleguillemont" overlapped with "c", \"{a}, \"{o} and \"{u} resulted in a dash over the letter a/o/u. In the minimal ConTeXt distribution the line \"{a}\"{o}\"{u}\v{c}\v{s}\v{z} resulted in "{a}"{o}"{u}v {c}v {s}v {z} (literally). The example with utf didn't even mind to compile there. Mojca
Mojca Miklavec wrote:
\usemodule[mathml] \starttext\startXMLdata <formula><math><mtext>\"{a}\"{o}\"{u}\v{c}\v{s}\v{z}</mtext></math></formula> \stopXMLdata\stoptext
fails as well.
An interesting observation: I tested on live.contextgarden.com, on the latest ConTeXt in MikTeX distribution and in an old minimal ConTeXt distribution for Windows (6.12.2004). The results from MikTeX and live.contextgarden.net were equal. \v{c} resulted in something like "leftdoubleguillemont" overlapped with "c", \"{a}, \"{o} and \"{u} resulted in a dash over the letter a/o/u. In the minimal ConTeXt distribution the line \"{a}\"{o}\"{u}\v{c}\v{s}\v{z} resulted in "{a}"{o}"{u}v {c}v {s}v {z} (literally).
Let me tell you that it's even stranger: \usemodule[mathml] \starttext \chardef\XMLtokensreduction\zerocount \startXMLdata <formula><math><mtext>\"{a}\"{o}\"{u}\v{c}\v{s}\v{z}</mtext></math></formula> \stopXMLdata \chardef\XMLtokensreduction\plustwo \startXMLdata <formula><math><mtext>\"{a}\"{o}\"{u}\v{c}\v{s}\v{z}</mtext></math></formula> \stopXMLdata \stoptext since currently the mml code reenters the tokenizer, the text inside mtext is actually seen as tex code the only way to get this fixed is to rewrite the mml parser (i've partially done that already using the normal xml handler instead of the messy mapper); it seems that i need to speed up that port. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
[...]
An interesting observation: I tested on live.contextgarden.com, on the latest ConTeXt in MikTeX distribution and in an old minimal ConTeXt distribution for Windows (6.12.2004). The results from MikTeX and live.contextgarden.net were equal. \v{c} resulted in something like "leftdoubleguillemont" overlapped with "c", \"{a}, \"{o} and \"{u} resulted in a dash over the letter a/o/u.
If there is something I have to change on contextgarden, please tell me off list (or the dev list). I might miss it here. Patrick -- ConTeXt wiki and more: http://contextgarden.net
participants (5)
-
Duncan Hothersall
-
Hans Hagen
-
James Cloos
-
Mojca Miklavec
-
Patrick Gundlach