Dear Hans and other friends: I play a little bit on MKIV's Chinese support and find it is incomplete. It has some serious problems, and lack important features as well. Well, I know it from mk.pdf that Chinese support is under construction, so in this email I just want to give some Chinese horizontal typesetting suggestions for development (vertical typesetting are more frequently used in Hongkong, Taiwan and Japan, so maybe people from there are eager to help): first, a features many Chinese people like: a lot of Chinese users write essays using full-width punct (that is, a punct mark is as the same size as the Chinese characters) and I found that the puncts are compressed well in context with good penalty settings(although there are minor problems in the pdf). however, sometimes it is up to the publisher which kind of punct marks are to be used, and usually many scientific books and papers use half-width puncts. so I think a new feature should be added to map all the Chinese puncts into english while at the same time, a space should be added after the English punct marks. that is, changing 中文,中文 into 中文, 中文 notice the space in the second example there is an opentype feature called hwid, but it should not be used because: - it changes the English alphabets as well - most of the time the function in most Chinese fonts is missing or buggy - it does not automatically add the space These are the minor problems in the mk.pdf: - pp118, penultimate example, box 2, line1, the ' punct mark should not appear at the end of the line - pp118, ultimate example, box 2, line2, in fact, if you want do perfect Chinese typesetting, all the puncts which begin a line or end a line should be closed to the margin line which makes the type block look good (especially when watch the right or left edge of the type block closely) while this ' has many space before it. In English, it is quite easy and the good old tex does a good job because all the letters and puncts are designed carefully, but in Chinese things are much difficult because all the glyphs including the puncts in all the Chinese fonts have the same width, and which makes the problem worse is that different Chinese fonts have different positions for these puncts. So a dynamic way which calculates the most left edge and most right edge of a punct glyph should be used, and it is not available in luatex:( will it be added later? second, and the most important, on the bilingual typesetting: nowadays English words are frequently used in Chinese, so great care must be taken to handle it. many Chinese users write their manuscript in one of the two ways: 中文 English 中文 or 中文English中文 The first way looks good when using a editor with mono space font and the second way is convenient for one to type. So both users will happy if both ways get the same result in context. in MKII, users are forced to use such kind of expression which drives them crazy and should be avoided: 中文English\ 中文 A small skip should be left between Chinese and English which makes the result much better. usually the space is a quarter of a chinese character width. A TeX expression should like: \hspace{0.25em plus 0.125em minus 0.08em} but I think it is better to introduce a \setupxxxwidth for the user. The last important thing for English and Chinese bi-lingual typesetting is that: do not use English glyphs in Chinese fonts because: - recently English glyphs in most of the Chinese fonts are ugly - users have no right to modify the Chinese fonts - many other reasons so when set up a Chinese typeface, it is better to leave a feature for the user to choose the accompany english typeface. while typesetting, Chinese typography rules do not interfere with the english one, that is, chinese part and english part use their own line break model. here are two bugs in recent context: - 中文 English English English 中文will produce 中文EnglishEnglishEnglish中文, the spaces between english words are all missing. - the following script produce an error: Invalid field id penalty for node type glyph (1). Here is a sample: \definefontfeature[chinese][mode=node,script=hang,lang=zht,script=hani,lang=dlft] \definefontsynonym[songti][name:AdobeSongStd-Light][features=Chinese] \definefontsynonym[Serif][songti] definetypeface[song][rm][serif][songti] \setupbodyfont[song,12pt,rm] \starttext 中文 English: 中文 \stoptext third, indenting: Chinese paragraphs are indented with two Chinese characters, and we also indent the first paragraph. context is so powerful that it is easy to do these kind of things, thank you:) and i think it is better to write it into the Chinese typesetting module? Urgency: bi-lingual typesetting > fix the wrong panelty > map the english punct maps as an alternative > indenting > puncts should be close to margin lines. I am eager to see full support for chinese and other Asian languages (and I am eager to help too), and i hope my suggestions here are helpful to the developers. I am grateful to Hans, Taco and and many other guys (eg. Zhichu Chen) that did a very good job in recent MKIV Chinese support. Thank you for your time and effort for continuous contributions! Yue Wang
Hello, Thanks for this comprehensive review. If I'm not mistaken, there is no specific code for CJKV typesetting in Mark IV; the examples in mk.pdf seem to use the generic font loading mechanism. I would like to answer more completely, but don't have much time for the moment. About some of your remarks:
so I think a new feature should be added to map all the Chinese puncts into english while at the same time, a space should be added after the English punct marks.
Would it not be better to automatically add shrinkable glue after Chinese punctuation, rather than replacing the character by force? This would be very much in line with the general TeX philosophy of setting text (and would probably suppress the need for half-width forms in the font altogether).
- pp118, penultimate example, box 2, line1, the ' punct mark should not appear at the end of the line
This should be taken care of by adding an appropriate penalty before the character.
- pp118, ultimate example, box 2, line2, in fact, if you want do perfect Chinese typesetting, all the puncts which begin a line or end a line should be closed to the margin line
Do you mean simply closer to the margin, or in the margin itself (protruding)? Protruding is already possible in pdfTeX; I believe it is available in LuaTeX as well, although it might be broken for the moment (Taco?). Setting the character closer to the margin should be possible as well, as a modified form of protruding, I trust.
A small skip should be left between Chinese and English which makes the result much better. usually the space is a quarter of a chinese character width. A TeX expression should like: \hspace{0.25em plus 0.125em minus 0.08em}
Again, this can be taken care of by automatically adding this glue between pairs of character of the appropriate category.
The last important thing for English and Chinese bi-lingual typesetting is that: do not use English glyphs in Chinese fonts
Sure, there should be a possibility of specifying a Western font to be used inside Chinese text.
- the following script produce an error: Invalid field id penalty for node type glyph (1).
I don't have that error here. This is very big font; are you sure it has been read entirely and correctly written to the cache? Lua crashed on my machine when I first compiled your example, and only a partial font hash was written to the cache (ConTeXt didn't crash, so the first compilation apparently ended well, but the cache was already filled with a partial font). I can imagine that problems will arise in the presence of a partially hashed font in the cache. Anyway, the code looks quite weird to me:
\definefontfeature[chinese][mode=node,script=hang,lang=zht,script=hani,lang=dlft]
This means that you activate two different scripts at the same time (hang == Hangul and hani == Han ideographs), and also two languages at the same time (zht == Chinese Traditional and dlft is probably a typo for dflt == default). I can't imagine what that is supposed to mean, and activating Traditional Chinese is probably wrong with Adobe Song Std which is a Simplified Chinese font. A saner definition of that feature would be in my opinion: \definefontfeature[chinese-traditional][mode=node,script=hani,lang=zhs] I know this code comes from mk.pdf, but I think it is a mistake. Finally, there is an interesting article by Jin-Hwan Cho (the dvipdfmx author) and Haruhiko Okumura about CJKV typesetting with Omega a couple of years ago. They have implemented all of the rules you mention above and a bit more; and although they used OTPs at the time, it should be quite straighforward to transpose it in Lua code (actually, I've done it a couple of months ago, but I have used plain LuaTeX, and in ConTeXt it should probably done using node processors or something). http://project.ktug.or.kr/omega-cjk/tug2004-preprint.pdf Arthur
Arthur Reutenauer wrote:
Thanks for this comprehensive review. If I'm not mistaken, there is no specific code for CJKV typesetting in Mark IV; the examples in mk.pdf seem to use the generic font loading mechanism.
I would like to answer more completely, but don't have much time for the moment. About some of your remarks:
actually, there is code in there but you need to specify chinese as feature \definefontfeature [chinese-traditional] [mode=node,script=hang,lang=zht] \definefontfeature [chinese-simple] [mode=node,script=hang,lang=zhs]
so I think a new feature should be added to map all the Chinese puncts into english while at the same time, a space should be added after the English punct marks.
Would it not be better to automatically add shrinkable glue after Chinese punctuation, rather than replacing the character by force? This would be very much in line with the general TeX philosophy of setting text (and would probably suppress the need for half-width forms in the font altogether).
there are penalties and glus nodes injected (based on specs given by some users)
- pp118, penultimate example, box 2, line1, the ' punct mark should not appear at the end of the line
probably an old mk.pdf (i'm awating some feedback before i post a new one)
This should be taken care of by adding an appropriate penalty before the character.
adding penalties is done based on a couple of tables
- pp118, ultimate example, box 2, line2, in fact, if you want do perfect Chinese typesetting, all the puncts which begin a line or end a line should be closed to the margin line
Do you mean simply closer to the margin, or in the margin itself (protruding)? Protruding is already possible in pdfTeX; I believe it is available in LuaTeX as well, although it might be broken for the moment (Taco?). Setting the character closer to the margin should be possible as well, as a modified form of protruding, I trust.
thisis always a bit of a trade off; i use samples with small width so at some point you run into tex optimizing situations; i'll make things configurable
A small skip should be left between Chinese and English which makes the result much better. usually the space is a quarter of a chinese character width. A TeX expression should like: \hspace{0.25em plus 0.125em minus 0.08em}
Again, this can be taken care of by automatically adding this glue between pairs of character of the appropriate category.
The last important thing for English and Chinese bi-lingual typesetting is that: do not use English glyphs in Chinese fonts
Sure, there should be a possibility of specifying a Western font to be used inside Chinese text.
font swichting; i still have to look into mixed fonts
- the following script produce an error: Invalid field id penalty for node type glyph (1).
I don't have that error here. This is very big font; are you sure it has been read entirely and correctly written to the cache? Lua crashed on my machine when I first compiled your example, and only a partial font hash was written to the cache (ConTeXt didn't crash, so the first compilation apparently ended well, but the cache was already filled with a partial font). I can imagine that problems will arise in the presence of a partially hashed font in the cache.
Anyway, the code looks quite weird to me:
\definefontfeature[chinese][mode=node,script=hang,lang=zht,script=hani,lang=dlft]
This means that you activate two different scripts at the same time (hang == Hangul and hani == Han ideographs), and also two languages at the same time (zht == Chinese Traditional and dlft is probably a typo for dflt == default). I can't imagine what that is supposed to mean, and activating Traditional Chinese is probably wrong with Adobe Song Std which is a Simplified Chinese font. A saner definition of that feature would be in my opinion:
indeed this disables chinese ...
\definefontfeature[chinese-traditional][mode=node,script=hani,lang=zhs]
I know this code comes from mk.pdf, but I think it is a mistake.
Finally, there is an interesting article by Jin-Hwan Cho (the dvipdfmx author) and Haruhiko Okumura about CJKV typesetting with Omega a couple of years ago. They have implemented all of the rules you mention above and a bit more; and although they used OTPs at the time, it should be quite straighforward to transpose it in Lua code (actually, I've done it a couple of months ago, but I have used plain LuaTeX, and in ConTeXt it should probably done using node processors or something).
indeed
i'll have a look ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
actually, there is code in there but you need to specify chinese as feature
\definefontfeature [chinese-traditional] [mode=node,script=hang,lang=zht] \definefontfeature [chinese-simple] [mode=node,script=hang,lang=zhs]
OK, but "hang" should still be replaced by "hani" if you want to use OpenType features for Chinese (Traditional or Simplified). Arthur
On Jan 28, 2008 3:17 AM, Arthur Reutenauer
Hello,
Thanks for this comprehensive review. If I'm not mistaken, there is no specific code for CJKV typesetting in Mark IV; the examples in mk.pdf seem to use the generic font loading mechanism.
This is wrong, fon-otf contains a few lua macros about linebreaking and char-def has information about the character width (full width, half width ...) and other information like opening punctuation, parenthesis but none of them is finished.
I would like to answer more completely, but don't have much time for the moment. About some of your remarks:
so I think a new feature should be added to map all the Chinese puncts into english while at the same time, a space should be added after the English punct marks.
Would it not be better to automatically add shrinkable glue after Chinese punctuation, rather than replacing the character by force? This would be very much in line with the general TeX philosophy of setting text (and would probably suppress the need for half-width forms in the font altogether).
- pp118, penultimate example, box 2, line1, the ' punct mark should not appear at the end of the line
This should be taken care of by adding an appropriate penalty before the character.
- pp118, ultimate example, box 2, line2, in fact, if you want do perfect Chinese typesetting, all the puncts which begin a line or end a line should be closed to the margin line
Do you mean simply closer to the margin, or in the margin itself (protruding)? Protruding is already possible in pdfTeX; I believe it is available in LuaTeX as well, although it might be broken for the moment (Taco?). Setting the character closer to the margin should be possible as well, as a modified form of protruding, I trust.
A small skip should be left between Chinese and English which makes the result much better. usually the space is a quarter of a chinese character width. A TeX expression should like: \hspace{0.25em plus 0.125em minus 0.08em}
Again, this can be taken care of by automatically adding this glue between pairs of character of the appropriate category.
The last important thing for English and Chinese bi-lingual typesetting is that: do not use English glyphs in Chinese fonts
Sure, there should be a possibility of specifying a Western font to be used inside Chinese text.
Could be done with cirtual fonts but we need a interface.
- the following script produce an error: Invalid field id penalty for node type glyph (1).
I don't have that error here. This is very big font; are you sure it has been read entirely and correctly written to the cache? Lua crashed on my machine when I first compiled your example, and only a partial font hash was written to the cache (ConTeXt didn't crash, so the first compilation apparently ended well, but the cache was already filled with a partial font). I can imagine that problems will arise in the presence of a partially hashed font in the cache.
Anyway, the code looks quite weird to me:
\definefontfeature[chinese][mode=node,script=hang,lang=zht,script=hani,lang=dlft]
This means that you activate two different scripts at the same time (hang == Hangul and hani == Han ideographs), and also two languages at the same time (zht == Chinese Traditional and dlft is probably a typo for dflt == default). I can't imagine what that is supposed to mean, and activating Traditional Chinese is probably wrong with Adobe Song Std which is a Simplified Chinese font. A saner definition of that feature would be in my opinion:
\definefontfeature[chinese-traditional][mode=node,script=hani,lang=zhs]
You need the hang script, it takes care about the linebreak.
I know this code comes from mk.pdf, but I think it is a mistake.
Finally, there is an interesting article by Jin-Hwan Cho (the dvipdfmx author) and Haruhiko Okumura about CJKV typesetting with Omega a couple of years ago. They have implemented all of the rules you mention above and a bit more; and although they used OTPs at the time, it should be quite straighforward to transpose it in Lua code (actually, I've done it a couple of months ago, but I have used plain LuaTeX, and in ConTeXt it should probably done using node processors or something).
This this currently done in font-otf.lua. Greetings, Wolfgang
This is wrong, fon-otf contains a few lua macros about linebreaking and char-def has information about the character width (full width, half width ...) and other information like opening punctuation, parenthesis but none of them is finished.
OK, I thought line breaking would be managed in node-*, so I didn't look in font-otf for it.
Sure, there should be a possibility of specifying a Western font to be used inside Chinese text.
Could be done with cirtual fonts but we need a interface.
Sure, no need to rush things.
You need the hang script, it takes care about the linebreak.
What do you mean? How does it take care about the linebreak? And how can it be relevant for Chinese characters? Default Chinese fonts from Adobe like AdobeSongStd don't have a "hang" script at all anyway. Do you know fonts that have? Arthur
Arthur Reutenauer wrote:
You need the hang script, it takes care about the linebreak.
What do you mean? How does it take care about the linebreak? And how can it be relevant for Chinese characters? Default Chinese fonts from Adobe like AdobeSongStd don't have a "hang" script at all anyway. Do you know fonts that have?
hey, i just gambled ... it's you who have to tell me what script/lang combinations to use; i just needed a value to kickstart the analyser and nobody bothered to correct me (same for arab, i just picked some) you don't seriousy think that i can read chinese eh? Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Arthur Reutenauer wrote:
Adobe like AdobeSongStd don't have a "hang" script at all anyway. Do you know fonts that have?
btw, the same is true for japanese and korean ... i like these glyphs and playing with them but i need input from users on how to organize things, i.e. script/lang combinations and rules for treating them so that i can write the analyzers Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Jan 29, 2008 12:19 AM, Arthur Reutenauer
This is wrong, fon-otf contains a few lua macros about linebreaking and char-def has information about the character width (full width, half width ...) and other information like opening punctuation, parenthesis but none of them is finished.
OK, I thought line breaking would be managed in node-*, so I didn't look in font-otf for it.
Sure, there should be a possibility of specifying a Western font to be used inside Chinese text.
Could be done with cirtual fonts but we need a interface.
Sure, no need to rush things.
You need the hang script, it takes care about the linebreak.
What do you mean? How does it take care about the linebreak? And how can it be relevant for Chinese characters? Default Chinese fonts from Adobe like AdobeSongStd don't have a "hang" script at all anyway. Do you know fonts that have?
You need the hang script in \definefontfeature to enable ConTeXt linebreak for CJK, don't ask me why I you have to use it as value for script. Wolfgang
Arthur Reutenauer wrote:
Do you mean simply closer to the margin, or in the margin itself (protruding)? Protruding is already possible in pdfTeX; I believe it is available in LuaTeX as well, although it might be broken for the moment (Taco?).
Protrusion should be available in luatex as well, but it may be incompatible with the mkiv code. Best wishes, Taco
Taco Hoekwater wrote:
Arthur Reutenauer wrote:
Do you mean simply closer to the margin, or in the margin itself (protruding)? Protruding is already possible in pdfTeX; I believe it is available in LuaTeX as well, although it might be broken for the moment (Taco?).
Protrusion should be available in luatex as well, but it may be incompatible with the mkiv code.
i'm not going to waste time on protruding in mkiv, later this year we will have proper font related protruding and hz tables and then i will pick up that thread Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
i'm not going to waste time on protruding in mkiv, later this year we will have proper font related protruding and hz tables and then i will pick up that thread
Anyway, if you read Yue's reply, he says the glyphs should not protrude in Chinese anyway ;-) But I'm pretty sure they can in Japanese (for some typographers at least). Arthur
Thank you very much for your mail!
On Mon, Jan 28, 2008 at 10:17 AM, Arthur Reutenauer
Hello,
Thanks for this comprehensive review. If I'm not mistaken, there is no specific code for CJKV typesetting in Mark IV; the examples in mk.pdf seem to use the generic font loading mechanism.
yes, there are. see the last part of font-otf.lua
I would like to answer more completely, but don't have much time for the moment. About some of your remarks:
Thank you for your time and effort:)
so I think a new feature should be added to map all the Chinese puncts into english while at the same time, a space should be added after the English punct marks.
Would it not be better to automatically add shrinkable glue after Chinese punctuation, rather than replacing the character by force? This would be very much in line with the general TeX philosophy of setting text (and would probably suppress the need for half-width forms in the font altogether).
Sorry I am making a mistake here, forgive me. According to the rules made by Chinese official, Chinese puncts should not map to English one, sorry about that. but there are two kinds of full stop in Chinese, one is a circle, another is a dot, usually we should map the circle full stop to dot stop in Chinese scientific typesetting.
- pp118, penultimate example, box 2, line1, the ' punct mark should not appear at the end of the line
This should be taken care of by adding an appropriate penalty before the character.
You are right:) There must be some problems in the penalty settings in font-otf.lua but I need some time to trace where. I think we should do something after the three elseif: line4563 , 4579 and 4588.
- pp118, ultimate example, box 2, line2, in fact, if you want do perfect Chinese typesetting, all the puncts which begin a line or end a line should be closed to the margin line
Do you mean simply closer to the margin, or in the margin itself (protruding)? Protruding is already possible in pdfTeX; I believe it is available in LuaTeX as well, although it might be broken for the moment (Taco?). Setting the character closer to the margin should be possible as well, as a modified form of protruding, I trust.
closer to the margin, not in the margin. It is possible, but we don't know how much width we should adjust because the puncts in different font have different position. Of course, we can adjust the space according to most of the fonts.
A small skip should be left between Chinese and English which makes the result much better. usually the space is a quarter of a chinese character width. A TeX expression should like: \hspace{0.25em plus 0.125em minus 0.08em}
Again, this can be taken care of by automatically adding this glue between pairs of character of the appropriate category.
Yes, and I think they should be added into font-otf.lua as well.
The last important thing for English and Chinese bi-lingual typesetting is that: do not use English glyphs in Chinese fonts
Sure, there should be a possibility of specifying a Western font to be used inside Chinese text.
Yes, and I think there should be an option left for the user when they setup their accompany fonts.
- the following script produce an error: Invalid field id penalty for node type glyph (1).
I don't have that error here. This is very big font; are you sure it has been read entirely and correctly written to the cache? Lua crashed on my machine when I first compiled your example, and only a partial font hash was written to the cache (ConTeXt didn't crash, so the first compilation apparently ended well, but the cache was already filled with a partial font). I can imagine that problems will arise in the presence of a partially hashed font in the cache.
I am sure lua parse it correctly (I get the tma and tmc file in the cache). I am using the 01.16 beta.
Anyway, the code looks quite weird to me:
\definefontfeature[chinese][mode=node,script=hang,lang=zht,script=hani,lang=dlft]
This means that you activate two different scripts at the same time (hang == Hangul and hani == Han ideographs), and also two languages at the same time (zht == Chinese Traditional and dlft is probably a typo for dflt == default). I can't imagine what that is supposed to mean, and activating Traditional Chinese is probably wrong with Adobe Song Std which is a Simplified Chinese font. A saner definition of that feature would be in my opinion:
\definefontfeature[chinese-traditional][mode=node,script=hani,lang=zhs]
I know this code comes from mk.pdf, but I think it is a mistake.
umm... it is a mess..... what does the the hang mean? maybe fonts.analyzers.methods.hang and fonts.analyzers. method.hani in font-otf.lua line 4505 and 4583 which is used to adjust the penalty between different CJK categories?
Finally, there is an interesting article by Jin-Hwan Cho (the dvipdfmx author) and Haruhiko Okumura about CJKV typesetting with Omega a couple of years ago. They have implemented all of the rules you mention above and a bit more; and although they used OTPs at the time, it should be quite straighforward to transpose it in Lua code (actually, I've done it a couple of months ago, but I have used plain LuaTeX, and in ConTeXt it should probably done using node processors or something).
Thank you for the link. In fact, many rules appear in the last part of font-otf.lua but it is incomplete. Chinese typesetting is easier than English typesetting because in Chinese we can break the line at any characters and no hyphenating algorithms is needed. The only thing is about the spaces between puncts and the penalty before and after the puncts. When English words are introduced, we should also take font switching and glue between chinese and english words into account.
http://project.ktug.or.kr/omega-cjk/tug2004-preprint.pdf
Arthur ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________
participants (5)
-
Arthur Reutenauer
-
Hans Hagen
-
Taco Hoekwater
-
Wolfgang Schuster
-
Yue Wang