Hi all. I have ConTeXt set up to output Chinese using usemodule[chinese], all fonts, encodings and maps are installed and the sample file works well. Now I have a whole load of Chinese text in utf-8 encoding. Can ConTeXt process this, or do I have to convert it to another encoding? I tried \enableregime[utf] and \useencoding[uc] but it just produced black blobs instead of Chinese characters. I hope ConTeXt can do it? :-) Thanks, Duncan
Duncan Hothersall wrote:
Hi all.
I have ConTeXt set up to output Chinese using usemodule[chinese], all fonts, encodings and maps are installed and the sample file works well.
Now I have a whole load of Chinese text in utf-8 encoding. Can ConTeXt process this, or do I have to convert it to another encoding? I tried \enableregime[utf] and \useencoding[uc] but it just produced black blobs instead of Chinese characters.
I hope ConTeXt can do it? :-)
Thanks,
Duncan
Please post output of texexec command. Maybe ConTeXt fails to find some
files?
--
Radhelorn
Radhelorn wrote:
Duncan Hothersall wrote:
Hi all.
I have ConTeXt set up to output Chinese using usemodule[chinese], all fonts, encodings and maps are installed and the sample file works well.
Now I have a whole load of Chinese text in utf-8 encoding. Can ConTeXt process this, or do I have to convert it to another encoding? I tried \enableregime[utf] and \useencoding[uc] but it just produced black blobs instead of Chinese characters.
I hope ConTeXt can do it? :-)
Thanks,
Duncan
Please post output of texexec command. Maybe ConTeXt fails to find some files?
that's tricky. the utf handler assumes named glyphs and noone named the 5000 chinese ones so far (some day pdftex will be unicode award so then problems will disappear) in the current utf handling mechanism i can envision something: - the utf code results in an expansion of the vector - instead of using a named glyph, we use a trick some variant on: \startunicodevector chinese_unicode_page_number_1 getglyph{ChineseFont1}{#1}% \stopunicodevector or probably due to some used trickery (untested) something like the following (not sure, best make a new command): \startunicodevector chinese_unicode_page_number_1 getglyph\endcsname{ChineseFont1}{#1}\gobbleoneargument \stopunicodevector so, then you only need to define the right fonts i.e. \definefont[ChineseFont1][whateverchinesefont_1] which has the right glyphs in the right slots so ... it's actually simple, once you have the fonts split up probably the getgyph needs to be replaced by a more clever one that handles special chinese thingies, another option is to write another mapper analogue to the ones already there for chinese, i.e. is there some mapping from utf to big5 or so and hook that into the utf handler. (beware, the font-chi modules talk about unicode while actually it's about dedicated mapings resembling a unicode approach; this \defineucharmapping stuff) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hi Duncan, Duncan Hothersall schrieb:
Hi all.
I have ConTeXt set up to output Chinese using usemodule[chinese], all fonts, encodings and maps are installed and the sample file works well.
Now I have a whole load of Chinese text in utf-8 encoding. Can ConTeXt process this, or do I have to convert it to another encoding? I tried \enableregime[utf] and \useencoding[uc] but it just produced black blobs instead of Chinese characters.
I hope ConTeXt can do it? :-)
Thanks,
Duncan
i prepared a small perl script to convert chinese utf-8 encoded tex-files to gbk coded tex-files. I call it right before using texexec.pl to create a pdf from the resulting tex-file. It has the advantage that you can use both simplified and traditional characters in one file, if you have full gbk enabled font files. (all chinese ht*.ttf) You can easy see all chinese characters on the screen with any unicode enabled Editor (Scite) Here you are: utf82gbk.pl ----------------------------- #!/usr\bin\perl -w use strict; use utf8; use Encode::HanConvert; our ($filename, $recoded); $filename = $ARGV[0]; $filename=~ s/\.tex$//io ; if (open(INP,"<:utf8","$filename.tex")) { print "processing file $filename.tex\n" ; $/ = "\0777" ; $_ = <INP> ; close(INP) ; simp_to_gb($_); use bytes; if ((open(OUT,">","$filename-gbk.tex"))) { print OUT $_ ; close(OUT) ; } } else { print "invalid filename\n" } if (-e "$filename-gbk.tex") {print "created file $filename-gbk.tex\n"} sub unirecode { my ($a,$b) = @_ ; if ((ord($b)<0x80)&&($b !~ /[a-zA-Z0-9]/)) { print "$b" ; ++$recoded ; return "\\uc\{" . ord($a) . "\}\{". ord($b) . "\}" } else { return "$a$b" } } if (open(INP,"$filename-gbk.tex")) { $recoded = 0 ; print "processing file $filename-gbk.tex " ; $/ = "\0777" ; $_ = <INP> ; close(INP) ; s/([\x80-\xFF])(.)/unirecode($1,$2)/mgoe ; if (($recoded)&&(open(OUT,">$filename.tmp"))) { print OUT $_ ; close(OUT) ; unlink "$filename-gbk.tex" ; rename "$filename-gbk.tmp", "$filename-gbk.tex" ; unlink "$filename-gbk.tmp" ; } if ($recoded) { print " - $recoded glyphs recoded - original saved as $filename-gbk.tec\n" } else { print "- no glyphs recoded\n" } } else { print "invalid filename\n" } ----------------------------- usage: utf82tex filename.tex texexec filename-gbk.tex It's a combination of Hans Hagens tex2uc.pl wich converts codes including tex related characters (\, {, } ...) into \unicodeglyph commands and an easy utf-8 to gbk converter. It needs the module Encode::HanConvert. I created 2 new Menuentries in my Scite Editor. "Create gbk texfile" wich creates filename-gbk.tex and "Process gbk texfile" wich runs texexec on this new file. It works for me very well. I hope this helps a bit until pdftex can handle unicode. Greetings from Potsdam, Germany Lutz P.S. Excuse my bad english
participants (4)
-
Duncan Hothersall
-
Hans Hagen
-
Lutz Haseloff
-
Radhelorn