UTF8 problems with Hangul Syllables

18 Dec 2002

      Here is my comment and question on the new feature of ConTeXt supporting
the UTF8 encoding.

Actually I tried to test the following short ConTeXt document containing
two Korean characters. At the second line I used the Bitstream Cyberbit
font and the corresponding TFM files were generated by ttf2tfm with Unicode.sfd
(the same way as the UTF8 support in CJK-LaTeX).

\enableregime [utf]
\definefontsynonym [UnicodeRegular] [cyberb]
\chardef\utfunihashmode=1
\starttext
^^eb^^bf^^a1
^^ec^^80^^80
\stoptext

Here, ^^eb^^bf^^a1 = U+BFE1 and ^^ec^^80^^80 = U+C000. 

1. Without the third line (\chardef\utfunihashmode=1), I could not see
   any characters. Why?

2. After enabling \utfunihashmode, I could see the first character. But
   not the second character. The difference was that the value of \unidiv
   were 191 for the first character and 192 for the second character.
   In fact, all characters with \unidiv >= 192 and \unidiv <= 223
   (from U+C000 to U+DFFF; half of Hangul Syllables) were not shown
   correctly. Why?

Anyway, it is now possible to get a PDF file containing several different
languages with ConTeXt + dvipdfmx. Furthermore, the texts in the PDF file
can be searched and extracted. Bookmarks and text annotations too!

I used the following map entry (usually in cid-x.map) for dvipdfmx.

cyberb@Unicode@ Identity-H :0:cyberbit.ttf

Best, ChoF.
-- 
~~~~~~~~~~~~~~~~~~~~~~~~~     ***
| Cho, Jin-Hwan == ChoF |     ^ ^
~~~~~~~~~~~~~~~~~~~~~~~~~      o
| Research Fellow       |     ~~~
| School of Mathematics ~~~~~~~~~~~~~~
| Korea Institute for Advanced Study |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| chofchof@ktug.or.kr                |
| http://free.kaist.ac.kr/ChoF/      |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Cho, Jin-Hwan

Hans Hagen

Cho, Jin-Hwan

Hans Hagen

Cho, Jin-Hwan

tags

participants (2)