[NTG-context] Chinese in utf-8
Lutz Haseloff
Lutz.Haseloff at lbapdm.brandenburg.de
Mon Oct 17 06:48:18 CEST 2005
Hi Duncan,
Duncan Hothersall schrieb:
> Hi all.
>
> I have ConTeXt set up to output Chinese using usemodule[chinese], all
> fonts, encodings and maps are installed and the sample file works well.
>
> Now I have a whole load of Chinese text in utf-8 encoding. Can ConTeXt
> process this, or do I have to convert it to another encoding? I tried
> \enableregime[utf] and \useencoding[uc] but it just produced black blobs
> instead of Chinese characters.
>
> I hope ConTeXt can do it? :-)
>
> Thanks,
>
> Duncan
i prepared a small perl script to convert chinese utf-8 encoded
tex-files to gbk coded tex-files. I call it right
before using texexec.pl to create a pdf from the resulting
tex-file. It has the advantage that you can use both simplified
and traditional characters in one file, if you have full gbk
enabled font files. (all chinese ht*.ttf)
You can easy see all chinese characters on the screen with any
unicode enabled Editor (Scite)
Here you are:
utf82gbk.pl
-----------------------------
#!/usr\bin\perl -w
use strict;
use utf8;
use Encode::HanConvert;
our ($filename, $recoded);
$filename = $ARGV[0];
$filename=~ s/\.tex$//io ;
if (open(INP,"<:utf8","$filename.tex"))
{
print "processing file $filename.tex\n" ;
$/ = "\0777" ;
$_ = <INP> ;
close(INP) ;
simp_to_gb($_);
use bytes;
if ((open(OUT,">","$filename-gbk.tex")))
{ print OUT $_ ;
close(OUT) ;
}
}
else
{ print "invalid filename\n" }
if (-e "$filename-gbk.tex") {print "created file $filename-gbk.tex\n"}
sub unirecode
{ my ($a,$b) = @_ ;
if ((ord($b)<0x80)&&($b !~ /[a-zA-Z0-9]/))
{ print "$b" ; ++$recoded ;
return "\\uc\{" . ord($a) . "\}\{". ord($b) . "\}" }
else
{ return "$a$b" } }
if (open(INP,"$filename-gbk.tex"))
{ $recoded = 0 ;
print "processing file $filename-gbk.tex " ;
$/ = "\0777" ;
$_ = <INP> ;
close(INP) ;
s/([\x80-\xFF])(.)/unirecode($1,$2)/mgoe ;
if (($recoded)&&(open(OUT,">$filename.tmp")))
{ print OUT $_ ;
close(OUT) ;
unlink "$filename-gbk.tex" ;
rename "$filename-gbk.tmp", "$filename-gbk.tex" ;
unlink "$filename-gbk.tmp" ;
}
if ($recoded)
{ print " - $recoded glyphs recoded - original saved as
$filename-gbk.tec\n" }
else
{ print "- no glyphs recoded\n" } }
else
{ print "invalid filename\n" }
-----------------------------
usage:
utf82tex filename.tex
texexec filename-gbk.tex
It's a combination of Hans Hagens tex2uc.pl wich converts
codes including tex related characters (\, {, } ...) into
\unicodeglyph commands and an easy utf-8 to gbk converter.
It needs the module Encode::HanConvert.
I created 2 new Menuentries in my Scite Editor.
"Create gbk texfile" wich creates filename-gbk.tex and
"Process gbk texfile" wich runs texexec on this new file.
It works for me very well.
I hope this helps a bit until pdftex can handle unicode.
Greetings from Potsdam, Germany
Lutz
P.S. Excuse my bad english
More information about the ntg-context
mailing list