[NTG-context] Chinese in utf-8

Lutz Haseloff Lutz.Haseloff at lbapdm.brandenburg.de
Mon Oct 17 06:48:18 CEST 2005


Hi Duncan,



Duncan Hothersall schrieb:
> Hi all.
> 
> I have ConTeXt set up to output Chinese using usemodule[chinese], all
> fonts, encodings and maps are installed and the sample file works well.
> 
> Now I have a whole load of Chinese text in utf-8 encoding. Can ConTeXt
> process this, or do I have to convert it to another encoding? I tried
> \enableregime[utf] and \useencoding[uc] but it just produced black blobs
> instead of Chinese characters.
> 
> I hope ConTeXt can do it? :-)
> 
> Thanks,
> 
> Duncan


i prepared a small perl script to convert chinese utf-8 encoded
tex-files to gbk coded tex-files. I call it right
before using texexec.pl to create a pdf from the resulting
tex-file. It has the advantage that you can use both simplified
and traditional characters in one file, if you have full gbk
enabled font files. (all chinese ht*.ttf)
You can easy see all chinese characters on the screen with any
unicode enabled Editor (Scite)

Here you are:

utf82gbk.pl

-----------------------------

#!/usr\bin\perl -w

use strict;
use utf8;
use Encode::HanConvert;

our ($filename, $recoded);

$filename = $ARGV[0];
$filename=~ s/\.tex$//io ;
if (open(INP,"<:utf8","$filename.tex"))
     {
       print "processing file $filename.tex\n" ;
       $/ = "\0777" ;
       $_ = <INP> ;
       close(INP) ;
        simp_to_gb($_);
use bytes;
if ((open(OUT,">","$filename-gbk.tex")))
         { print OUT $_ ;
           close(OUT) ;
           }
       }
   else
     { print "invalid filename\n" }
if (-e "$filename-gbk.tex") {print "created file $filename-gbk.tex\n"}

sub unirecode
  { my ($a,$b) = @_ ;
    if ((ord($b)<0x80)&&($b !~ /[a-zA-Z0-9]/))
      { print "$b" ; ++$recoded ;
        return "\\uc\{" . ord($a) . "\}\{". ord($b) . "\}" }
    else
      { return "$a$b" } }

if (open(INP,"$filename-gbk.tex"))
     { $recoded  = 0 ;
       print "processing file $filename-gbk.tex " ;
       $/ = "\0777" ;
       $_ = <INP> ;
       close(INP) ;
       s/([\x80-\xFF])(.)/unirecode($1,$2)/mgoe ;
       if (($recoded)&&(open(OUT,">$filename.tmp")))
         {  print OUT $_ ;
            close(OUT) ;
            unlink "$filename-gbk.tex" ;
            rename "$filename-gbk.tmp", "$filename-gbk.tex" ;
            unlink "$filename-gbk.tmp" ;
            }
       if ($recoded)
         { print " - $recoded glyphs recoded - original saved as
$filename-gbk.tec\n" }
       else
         { print "- no glyphs recoded\n" } }
   else
     { print "invalid filename\n" }


-----------------------------
usage:
utf82tex filename.tex
texexec filename-gbk.tex

It's a combination of Hans Hagens tex2uc.pl wich converts
codes including tex related characters (\, {, } ...) into
\unicodeglyph commands and an easy utf-8 to gbk converter.
It needs the module Encode::HanConvert.

I created 2 new Menuentries in my Scite Editor.
"Create gbk texfile" wich creates filename-gbk.tex and
"Process gbk texfile" wich runs texexec on this new file.
It works for me very well.

I hope this helps a bit until pdftex can handle unicode.

Greetings from Potsdam, Germany

Lutz

P.S. Excuse my bad english


More information about the ntg-context mailing list