Re: [NTG-context] <mtext> UTF further problems
Duncan> I'm still having trouble with UTF contents in <mtext> tags in Duncan> MathML content.
For whatever it is worth, I just tried that. A double-acute u (U+0171) came through w/o problem. I'm using a gentoo box w/ tetex 3.0.
Thanks very much for trying it out.
I tried both dvi and pdf output. Both worked.
Are you sure your file is in utf-8 and not, eg, utf-16?
I was, but I'm no longer sure of anything. :-) Is there a foolproof way of finding out? It seems that lots of editors try to 'help' by doing automatic guessing and automatic translations into other encodings, making it very difficult to tie things down. (And web browsers do that same when submitting things over the web, so I can't do what I usually do and test on Live.) I tend to use emacs, which I thought was a pretty safe bet, but maybe I should try something else?
What platform are you on?
I'm testing on both Windows and (Redhat) linux, both with the current minimal ConTeXt installations (i.e. mswintex.zip and linuxtex.zip). They exhibit the same behaviour. Thanks for any advice you can give. Duncan
"Duncan" == Duncan Hothersall
writes:
Are you sure your file is in utf-8 and not, eg, utf-16?
Duncan> I was, but I'm no longer sure of anything. :-) Is there a
Duncan> foolproof way of finding out?
(First, I cannot comment usefully wrt this topic and windows.)
Try this at a shell prompt:
env LANG=C LC_ALL=C cat --show-all FileName
where FileName is the file in question. The non-ascii characters will
be output as strings that look M-? where ? is a single ascii character.
If you see a single M-? triplet in place of each non-ascii character
you do not have utf-8. If you see between two and five such triplets
for each non-ascii character in the document it is probably utf-8.
(If you see ^@ pairs separating the ascii chars you have utf-16.)
Of course, context would not be able to deal with utf16 on linux;
tex would just get confused by the interspersed NULLs (represented
as ^@ in the --show-all output described above) in the initial lines.
So if it is an encoding problem, it is more likely that you are ending
up with a file in one of the iso8859 8-bit encodings.
A (not-so-?)quick test is this. Save it w/o the leading blanks
and run it, passing a filename as a single argument.
#!/bin/bash
# change foo.tex in the next line to your filename
for ij in $(seq 1 15); do
iconv -f iso8859-${ij} -t utf8 <$1 >from-${ij}-$1 && \
texexec from-${ij}-$1
done
Then test all of the generated dvi files to see whether any worked.
Duncan> I tend to use emacs, which I thought was a pretty safe bet,
Duncan> but maybe I should try something else?
I also use emacs, but from cvs. (Gentoo has an emacs-cvs ebuild that
makes that easy.) I also run with LANG=en_US.UTF-8 and several
settings in emacs to prefer utf8. The emacs-unicode-2 branch in CVS
(what will become emacs-23; CVS HEAD will become emacs-22) is even
better for this since it uses unicode as its internal representation.
Duncan> I'm testing on both Windows and (Redhat) linux, both with the
Duncan> current minimal ConTeXt installations (i.e. mswintex.zip and
Duncan> linuxtex.zip). They exhibit the same behaviour.
I've only tested on tetex-3. That may make a difference....
You may want to give TeX-Live a test.
-JimC
--
James H. Cloos, Jr.
participants (2)
-
Duncan Hothersall
-
James Cloos