Re: [NTG-context] CJK support in ConTeXt
It's time to discuss the topic "CJK support in ConTeXt" in a more public place.
Agree!
The currently ConTeXt release support UTF-8, CJKV locales can all be represented in UTF-8, so the mechanism is already done. IMHO, the real one work to do is to make the sets of typesetting conventions (typesetting rules) for each locale of CJKV (like the Chinese package did), that would varies quite much from locale to locale, all all of them are based on the same underlayed mechanism. Best, Hong _________________________________________________________ Do You Yahoo!? 启用电邮帐号,领会雅虎通[身临其境聊电影]的动感魅力,还有网络摄像头+雅虎通收音机等你来拿 http://cn.rd.yahoo.com/mail_cn/tag/?http://cn.messenger.yahoo.com
Hong Feng wrote:
The currently ConTeXt release support UTF-8, CJKV locales can all be represented in UTF-8, so the mechanism is already done.
IMHO, the real one work to do is to make the sets of typesetting conventions (typesetting rules) for each locale of CJKV (like the Chinese package did), that would varies quite much from locale to locale, all all of them are based on the same underlayed mechanism.
My very first experiment with trying to use Japanese with ConTeXt made use of ConTeXt's built-in support for UTF-8. I soon found out that there is no line breaking present when UTF-8 support is used. So that has to be implemented. I'm not sure, but I remember that Hans once told me that that the typesetting mechanism and line breaking algorithm as used in the Chinese module cannot be directly used for UTF-8 support. Therefore I'm not sure if we can simply say that the 'mechanism is already done'. Maybe Hans can tell how difficult it is to add a line breaking mechanism to the UTF-8 support? It would be really handy if we could use ConTeXt's UTF-8 support so that some work for a CJK module is already done. But on the other hand, by using e-Omega, a lot of work is also already done. We have to make sure that adapting the UTF-8 mechanism doesn't take more time and effort than creating a module based on e-Omega. A CJK module based on e-Omega is maybe easier to write and more flexible. For example, not everyone can write documents in UTF-8. e-Omega will allow almost any kind of file encoding as long as there is an OTP available to convert it to Unicode. So I think the questions we have to ask ourselves are: Do we make line breaking and typesetting algorithms for ConTeXt's UTF-8 support or for e-Omega? What is the time and effort needed in creating a CJK module for each solution? And what solution gives the most powerful typesetting options? Personally, I would go for the e-Omega option, but I wouldn't mind seeing a module based on ConTeXt's UTF-8 support. As long as there is support for CJK, I'll be happy! :) My best, Tim
participants (2)
-
Hong Feng
-
Tim 't Hart