Matthew Huggett wrote:
I asked about Japanese a while back. Hans requested more information on encodings, fonts, etc. I don't know enough about these things or ConTeXt to know what is needed exactly.
From what I've read, unicode is not that popular in Japan itself. ...
Unicode wasn't that popular because Unix-like operating systems used EUC as encoding, and Microsoft used their own invented Shift-JIS encoding. So there is still a lot of digital text out there written in these encodings, and a lot of tools still use it. But I think that if you want to write new texts, using Unicode shouldn't be a problem for most users. I guess that most editors supporting Asian encodings also make it possible to save in UTF-8. I think nowadays it's easier to find a Unicode enabled editor than it is to find a Shift-JIS/EUC editor! (Well, on Windows anyway...). Since ConTeXt already supports UTF-8, I don't see a reason to make thinks more difficult than they already are by writing text in other encodings. When I look at the source of the Chinese module, the most difficult part for me to understand is the part about font encoding, the enco-chi.tex file, and the use of \defineuclass in that file. I guess it has to do something with mapping the written text to the font. If I understand correctly, the Chinese module doesn't use Unicode fonts, but GBK or Big5 encoded fonts. I guess that if you want to make a proper Japanese module, you'll need to support JIS or Shift-JIS encoded fonts. But on the other hand, maybe we don't need to support that since there are a lot of Japanese Unicode fonts available. I use WinXP, and there we have msmincho.ttc and msgothic.ttc, which are both Unicode fonts. I also use kochi-mincho.ttf and kochi-gothic.ttf, which are both freely available Japanese Unicode fonts. And Cyberbit is a Unicoded font as well. Commercially available fonts by Dynalab (Dynafont Japanese TrueType collection is quite cheap and very good) are also Unicode fonts. Again, I don't think we should make it difficult for ourselves by trying to support non-Unicode fonts while unicoded Japanese fonts are easy to use and widely available.
Typesetting Japanese could be more complicated than Chinese because of the concurrent use of four writing systems
The fact that Japanese uses four writing systems is not really a problem. Hiragana and Katakana (Kana) are just part of other Unicode ranges than Kanji/Chinese. Things might get difficult if you want to use different fonts for Kana than you are using for Kanji. Then you need to assign a different font to a different Unicode range. But I have no idea why somebody wants to do such a thing! Just using Unicode and a Japanese Unicode font will take care of things. If you type Romaji/Latin characters in the example I posted yesterday, they get printed in CMR. I did some tests and I could change the font in any other font I wanted to, just by using the normal ConTeXt font mechanisms. So I guess it is easy to mix Japanese fonts with normal Latin fonts.
I guess I need to track down a few sample documents. I tried to turn up some info on Japanese typesetting rules but had no luck.
The only info I got is from Ken Lunde's CJKV book, where he mentions some rules about CJK line breaking. Also, some characters are allowed to protrude in the right margin. I have some OTP's for Omega which handles all of this. They can be seen here: http://www.math.jussieu.fr/~zoonek/LaTeX/Omega-Japanese/doc.html At first I wanted to use Omega with ConTeXt so that I could use these OTP's, but Omega isn't really stable. With the ConTeXt example that I posted yesterday, I am already able to write Japanese in UTF-8, use a Unicoded Japanese font in ConTeXt, and get Japanese output. I hope the hard part is already behind me! :-) The only thing that still puzzles me is how I can add interglyph space so that TeX can break the lines. If someone can help, I would really appreciate it! My best, Tim