Re: Chinese

newer
typescript problem (hw variants)

older
linux installations

Duncan Hothersall

13 Dec 2005 13 Dec '05

9:07 a.m.

Hans wrote:

...

chinese is not yet defined in utf so if you want that, we need to do it ... assuming this, how about making a set of tfm,enc,map files that match the unicode positions (volunteers ...)

I'm very willing to help, especially if there is some drudge work involved in constructing the files. I don't know enough (yet) about the logic of it all to help with setting up the system, but if someone can supply skeleton files and/or a method for constructing the necessary files, I'm happy to do any leg-work. Duncan

Show replies by date

Hans Hagen

13 Dec 13 Dec

10:52 a.m.

New subject: Chinese

Duncan Hothersall wrote:

...

Hans wrote:

...
chinese is not yet defined in utf so if you want that, we need to do it

...

...
assuming this, how about making a set of tfm,enc,map files that match the unicode positions (volunteers ...)

I'm very willing to help, especially if there is some drudge work involved in constructing the files. I don't know enough (yet) about the logic of it all to help with setting up the system, but if someone can supply skeleton files and/or a method for constructing the necessary files, I'm happy to do any leg-work.

what we need is a set of encoding files like /UniEncoding52 [ .... /uni52DF /uni52E0 /uni52E1 /uni52E2 /uni52E3 /uni52E4 ... /.notdef .... ] def that represent the ranges and can be used to construct tfm files. (or whatever index entry is needed in order to filter the metrics from the ttf file) maybe patricks font code already can do that: - read in a ttf file (or a glyph list produced by ttf2tfm or ttf2afm) - make a range of enc and tfm files actually, this is rather generic, since pdftex can handle symbolic names like /index... and /uni..., so if we have such a set, we can stick to one bunch of enc files the utf handler can then simply access char E1 from htsong-52.tfm testing is rather simple: \pdfmapline{htsong-52

sjoerd siebinga

11:03 a.m.

New subject: Chinese

On 13 Dec 2005, at 10:52, Hans Hagen wrote:

...

Duncan Hothersall wrote:

...
Hans wrote:

...
chinese is not yet defined in utf so if you want that, we need to do it

...

...
assuming this, how about making a set of tfm,enc,map files that match the unicode positions (volunteers ...)

I'm very willing to help, especially if there is some drudge work involved in constructing the files. I don't know enough (yet) about the logic of it all to help with setting up the system, but if someone can supply skeleton files and/or a method for constructing the necessary files, I'm happy to do any leg-work.

what we need is a set of encoding files like

/UniEncoding52 [ .... /uni52DF /uni52E0 /uni52E1 /uni52E2 /uni52E3 /uni52E4 ... /.notdef .... ] def

I have made a Ruby-script (for personal use loosely based on Adam's xsl-files) which generates all the encoding- and symbolfiles from a given cmapfile. If someone could send me the ttf-font, I can generate all the necessary encodingfiles for you. Sjoerd

Hans Hagen

11:34 a.m.

New subject: Chinese

sjoerd siebinga wrote:

...

I have made a Ruby-script (for personal use loosely based on Adam's xsl-files) which generates all the encoding- and symbolfiles from a given cmapfile. If someone could send me the ttf-font, I can generate all the necessary encodingfiles for you.

the chinese fonts mentioned in the context garden qualify for such a treatment (htsong cum suis) Hans

sjoerd siebinga

12:26 p.m.

New subject: Chinese

On 13 Dec 2005, at 11:34, Hans Hagen wrote:

...

sjoerd siebinga wrote:

...
I have made a Ruby-script (for personal use loosely based on Adam's xsl-files) which generates all the encoding- and symbolfiles from a given cmapfile. If someone could send me the ttf-font, I can generate all the necessary encodingfiles for you.

the chinese fonts mentioned in the context garden qualify for such a treatment (htsong cum suis)

Ok. Where can I send the chinese encodingfiles?

Hans Hagen

2:02 p.m.

New subject: your mails at go

sjoerd siebinga wrote:

...

On 13 Dec 2005, at 11:34, Hans Hagen wrote:

...
sjoerd siebinga wrote:

...
I have made a Ruby-script (for personal use loosely based on Adam's xsl-files) which generates all the encoding- and symbolfiles from a given cmapfile. If someone could send me the ttf-font, I can generate all the necessary encodingfiles for you.

the chinese fonts mentioned in the context garden qualify for such a treatment (htsong cum suis)

Ok. Where can I send the chinese encodingfiles?

you can send me a zip maybe we should start thinking on how to set up a repository at https://foundry.supelec.fr/ taco and patrick have more experience in this area than i have so maybe they have some ideas on how to organize this Hans

Tobias Burnus

11:46 a.m.

New subject: Chinese

Hi, sjoerd siebinga wrote:

...

I have made a Ruby-script (for personal use loosely based on Adam's xsl-files) which generates all the encoding- and symbolfiles from a given cmapfile. If someone could send me the ttf-font, I can generate all the necessary encodingfiles for you. Nice! The recommended (by Xiao Jianfeng) TrueType fonts are given at http://wiki.contextgarden.net/Chinese They are ftp://ftp.ctex.org/pub/tex/fonts/truetype/ttf/htfs.ttf ftp://ftp.ctex.org/pub/tex/fonts/truetype/ttf/hthei.ttf ftp://ftp.ctex.org/pub/tex/fonts/truetype/ttf/htkai.ttf ftp://ftp.ctex.org/pub/tex/fonts/truetype/ttf/htsong.ttf

Richard Gabriel wrote:

...

But yet another question: What about Japanese? I've made only small research so far, but unlike Chinese, there's almost no information about Japanese in TeX. How much of work would be to adjust the current "chinese" ConTeXt module for Japanese? What would you need for it? [Of course, meanwhile I'll investigate some other ways of typesetting Japanese...] (I don't know much about Japanese.)

In Japanese contrary to Chinese they mix different character sets: - The Chinese characters ("Kanji"), which seem to make up most of the (scientific) text (I'v seen); in addition some pronouncation based characters are used: - ("Kana":) Hiragana and Katagana; the former are rather round characters in Japanese texts, most prominent should be "の" [means something like "of" in English]. They are mostly used for suffixes/prefixes where no Chinese equivalent exists. Whereas Katagana is used to write words which have been taken from (mostly) European languages. For Kanji there should be no problem with the Chinese module, for Kana you need additional support for these characters. Since they are pronouncation based, they only consisted of < 50 Characters each. Tobias (Hmm, I never though I would end up such deep in linguistics duing my PhD theses in physics. But having three Chinese in the group and doing regularily some measurements at a research centre in Taiwan - I couldn't help picking up something.)

Hans Hagen

11:56 a.m.

New subject: Chinese

Tobias Burnus wrote:

...

(Hmm, I never though I would end up such deep in linguistics duing my PhD theses in physics. But having three Chinese in the group and doing regularily some measurements at a research centre in Taiwan - I couldn't help picking up something.)

well, there is a certain charm in those characters, even if you cannot read them (during a 2*10 hour trip in a chinese bus during the last tug conference one quickly learns to recognize the symbols for gas stations and such -) browsing a chinese-english dictionary is also fun (i have a small one on my desk; some day i should start collecting dictionaries of all languages that context supports -); with a bit of puzzling one can find out the system behind the way words are made up Hans

Adam Lindsay

1:33 p.m.

New subject: Chinese

Hans Hagen wrote:

...

what we need is a set of encoding files like

/UniEncoding52 [ .... /uni52DF /uni52E0

I hate to be negative, but I have doubts about how generic this approach may be. In some tentative experiments, I discovered that many (most?) CJK fonts don't use traditional postscript names, but rather map from unicode to an indexed glyph number. Fortunately, ttf2tfm's -w enco@Unicode@ notation seems to address this in most of the old test cases I tried. adam -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Adam T. Lindsay, Computing Dept. atl@comp.lancs.ac.uk Lancaster University, InfoLab21 +44(0)1524/510.514 Lancaster, LA1 4WA, UK Fax:+44(0)1524/510.492 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Hans Hagen

4:12 p.m.

New subject: Chinese

Adam Lindsay wrote:

...

Hans Hagen wrote:

...
what we need is a set of encoding files like

/UniEncoding52 [ .... /uni52DF /uni52E0

I hate to be negative, but I have doubts about how generic this approach may be. In some tentative experiments, I discovered that many (most?) CJK fonts don't use traditional postscript names, but rather map from unicode to an indexed glyph number.

Fortunately, ttf2tfm's -w enco@Unicode@ notation seems to address this in most of the old test cases I tried.

afaik pdftex can handle the indexXXXX and unicXXXX entries as alternatives for glyphnames Hans

Adam Lindsay

4:29 p.m.

New subject: Chinese

Hans Hagen wrote:

...

Adam Lindsay wrote:

...
Fortunately, ttf2tfm's -w enco@Unicode@ notation seems to address this in most of the old test cases I tried.

afaik pdftex can handle the indexXXXX and unicXXXX entries as alternatives for glyphnames

Yes. Sorry I wasn't clear on that. It's just that ttf2tfm is the tool that does a good job at extracting those entries when other tools fail. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Adam T. Lindsay, Computing Dept. atl@comp.lancs.ac.uk Lancaster University, InfoLab21 +44(0)1524/510.514 Lancaster, LA1 4WA, UK Fax:+44(0)1524/510.492 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

7144

Age (days ago)

7144

Last active (days ago)

List overview

Download

10 comments

5 participants

participants (5)

Adam Lindsay
Duncan Hothersall
Hans Hagen
sjoerd siebinga
Tobias Burnus

Re: Chinese

sjoerd siebinga

sjoerd siebinga

Tobias Burnus

tags

participants (5)