DocBookInContext & multi-languages (newbie)

Gour

29 Nov 2002 29 Nov '02

8:20 a.m.

Hello list! I'm a new in ConTeXt. After few boks done in LyX (LaTeX) I decided to enhance my "publishing". Since I need to write in DocBook, I was thrilled to find DocBookInContext package which enables to map from DocBook to ConTeXt. I prepared a small article in DocBook and converted it to PDF by: texexec --pdf file (I'm running SuSE 8.0 & teTeX). The problem is that I wanted to include some Croatian national characters in the DocBook file, but they are not shown in generated file. What should I do to be able to have both English & Croatian language in ConTeXt? (In LyX, I would simply use latin-2 encoding and write English & Croatian.) Usually documents in DocBook have Unicode encoding (UTF-8). What encoding has to be defined so that conversion DocBook -> ConTeXt will work properly? I also run texexec --make --language=hr,en hr Sincerely, Gour -- Gour gour@mail.inet.hr Registered Linux User #278493

Show replies by date

Simon Pepping

29 Nov 29 Nov

8:18 p.m.

On Fri, Nov 29, 2002 at 08:20:39AM +0100, Gour wrote:

...

Hello list!

Since I need to write in DocBook, I was thrilled to find DocBookInContext package which enables to map from DocBook to ConTeXt.

I prepared a small article in DocBook and converted it to PDF by: texexec --pdf file (I'm running SuSE 8.0 & teTeX).

The problem is that I wanted to include some Croatian national characters in the DocBook file, but they are not shown in generated file.

What should I do to be able to have both English & Croatian language in ConTeXt?

(In LyX, I would simply use latin-2 encoding and write English & Croatian.)

Usually documents in DocBook have Unicode encoding (UTF-8).

What encoding has to be defined so that conversion DocBook -> ConTeXt will work properly?

I also run texexec --make --language=hr,en hr

I would like to know that too :-) I have not yet found the time to find out how Context deals with encodings. I only have a note that says that one should do \useXMLfilter [utf], and that I should have a look at the xtag-utf (which is input by the above command) or enco files. I would hope that context develops generic input encoding support, so that I only have to scan the encoding value in the XML declaration, and input the appropriate encoding file. Regards, Simon -- Simon Pepping email: spepping@scaprea.hobby.nl

Gour

30 Nov 30 Nov

9:15 p.m.

Simon Pepping (spepping@scaprea.hobby.nl) wrote:

...

I would like to know that too :-) I have not yet found the time to find out how Context deals with encodings. I only have a note that says that one should do \useXMLfilter [utf], and that I should have a look at the xtag-utf (which is input by the above command) or enco files.

As far as I can see ConTeXt does not understand utf-8 encoding. Where did you find this note mentioning utf?

...

I would hope that context develops generic input encoding support, so that I only have to scan the encoding value in the XML declaration, and input the appropriate encoding file.

I just read one report abut the problem to publish Unicode documents - FO & similar converters are not mature enough, PassiveTeX isn't easy to install, Omega has its own problems .. Some time ago I saw a post on DocBook list from Sebastian Rahtz who is considering to rewrite PassiveTex with ConTeXt support instead of LaTeX. However, I'm wondering what is the present route for those wanting to get their utf-8 encoded documents published in ConTeXt? DocBook is very valuable tool enabling to author documents with DocBook and thenget high quality output with ConTeXt & TeX. The question remains, how to do it with multi-lingual document encoded in utf-8? Any hint? Sincerely, Gour -- Gour gour@mail.inet.hr Registered Linux User #278493

Bruce D'Arcus

9:55 p.m.

On Saturday, November 30, 2002, at 03:15 PM, Gour wrote:

...

However, I'm wondering what is the present route for those wanting to get their utf-8 encoded documents published in ConTeXt?

I've wondering about this myself. Given that the default encoding for XML is utf-8, it'd seem ConTeXt ought to support it if its going to typeset XML. I posted a note about the tbook project (http://tbookdtd.sf.net/) a couple of weeks ago, which includes this binary (description from manual):

...

tbrplent is a filter program that scans for non-ASCII UTF-8s in the input stream and creates decent LATEX macros or, if possible, Latin-1 characters for the output stream.

Does ConTeXt have an equivalent, or can this one perhaps be modified? Bruce

Gour

1 Dec 1 Dec

7:40 a.m.

Bruce D'Arcus (bdarcus@fastmail.fm) wrote:

...

I've wondering about this myself. Given that the default encoding for XML is utf-8, it'd seem ConTeXt ought to support it if its going to typeset XML.

Yes, that's logical and I'm wondering whether it's true given the fact that TeX by itself doesn't handle utf-8.

...

I posted a note about the tbook project (http://tbookdtd.sf.net/) a couple of weeks ago, which includes this binary (description from manual):

I saw it and it sounds good. It even has xindy for indexing which I use under LyX (LaTeX). Would be nice to have similar capabilities with ConTeXt, however I need DocBook echangeability.

...

...
tbrplent is a filter program that scans for non-ASCII UTF-8s in the input stream and creates decent LATEX macros or, if possible, Latin-1 characters for the output stream.

Does ConTeXt have an equivalent, or can this one perhaps be modified?

I hope some ConTeXt guru can enlighten us regards. Sincerely, Gour

Simon Pepping

2 Dec 2 Dec

8:46 p.m.

On Sat, Nov 30, 2002 at 09:15:45PM +0100, Gour wrote:

...

Simon Pepping (spepping@scaprea.hobby.nl) wrote:

...
I would like to know that too :-) I have not yet found the time to find out how Context deals with encodings. I only have a note that says that one should do \useXMLfilter [utf], and that I should have a look at the xtag-utf (which is input by the above command) or enco files.

As far as I can see ConTeXt does not understand utf-8 encoding.

Where did you find this note mentioning utf?

On my computer :-) I collected remarks made on this list in that document.

...

Some time ago I saw a post on DocBook list from Sebastian Rahtz who is considering to rewrite PassiveTex with ConTeXt support instead of LaTeX.

That would be very good; much better than just doing docbook. Sometimes I think I would better spend my time on such an effort, but I am afraid it is a huge task.

...

The question remains, how to do it with multi-lingual document encoded in utf-8?

Any hint?

As is the case more often in open source: do it yourself. Hans has not taken part in this discussion, so I think he does not feel like embarking on an effort in this area. The basic mechanism to make TeX work with encodings is to declare all characters above 127 active, and map them to a suitable control sequence. But that only works with single-byte encodings. xmltex, David Carlisle's XML parser in tex, which is used by Passivetex, can swallow and interpret utf-8 encoding. I think he applies the utf-8 rules to the sequences of single bytes. It should be easy to transfer this to Context, because it should not be macro package dependent. The other options are: use an input filter, like the program that was mentioned in this thread. Or use NTS, the java based TeX implementation. Currently it does not deal with multibyte encodings because it is artificially restricted to 256 characters (if I remember correctly) and because there are no input encoding macro packages for higher character codes. Sebastian's PassiveTeX has long mapping tables for unicode to latex control sequences. These can be translated to context. (And they could be made to work with NTS.) While I am writing this, I am beginning to think that copying xmltex's algorithm to context is the best way to go. Regards, Simon -- Simon Pepping email: spepping@scaprea.hobby.nl

Tobias Burnus

9:30 p.m.

Hi, On Mon, 2 Dec 2002, Simon Pepping wrote:

...

On Sat, Nov 30, 2002 at 09:15:45PM +0100, Gour wrote:

...
Simon Pepping (spepping@scaprea.hobby.nl) wrote:

...
says that one should do \useXMLfilter [utf], and that I should have a look at the xtag-utf (which is input by the above command) or enco files. As far as I can see ConTeXt does not understand utf-8 encoding. Well it works with utf8 if you include xtag-utf.tex ($TEXMF/tex/context/base/xtag-utf.tex). That works for instance: \input xtag-utf.tex øãö \bye (\o,\~a,\"o)

The problem is that that file doesn't contain all > 50000 characters but only a few (basically latin1 accented characters)

...

...
Where did you find this note mentioning utf? I think it went over the mailing list (look at the mailarchive).

...

...
Some time ago I saw a post on DocBook list from Sebastian Rahtz who is considering to rewrite PassiveTex with ConTeXt support instead of LaTeX. That would be nice!

...

...
The question remains, how to do it with multi-lingual document encoded in utf-8? Any hint? See above. The problem is that a nice font would be needed to.

By the way, I'm looking for a nice looking serif font, which I can use as math font and which contains at least all MES-1, better also the MES-2 characters (http://www.evertype.com/standards/iso10646/pdf/cwa13873.pdf) and the default ligatures used by TeX. So far I mainly found either WGL4 compatible fonts (http://partners.adobe.com/asn/developer/opentype/appendices/wgl4.html) or fonts which can be used for math in TeX, but not both. (At least not within a amount of money which I can spend ;-) Tobias

Hans Hagen

10:54 p.m.

At 08:46 PM 12/2/2002 +0100, you wrote:

...

...
Some time ago I saw a post on DocBook list from Sebastian Rahtz who is considering to rewrite PassiveTex with ConTeXt support instead of LaTeX.

That would be very good; much better than just doing docbook. Sometimes I think I would better spend my time on such an effort, but I am afraid it is a huge task.

since i know a bit about the context internals ... actually, the best way to handle it is to build on top of low level counterparts; will look into that later [the only use i see for fo's in our workflows is as sub docs; kind of image like approach; so i'll writ eit anyway]

...

...
The question remains, how to do it with multi-lingual document encoded in utf-8?

Any hint?

As is the case more often in open source: do it yourself. Hans has not taken part in this discussion, so I think he does not feel like embarking on an effort in this area.

hm, utf is on my agenda, but for chars that i don't use myself i depend on others -)

...

The basic mechanism to make TeX work with encodings is to declare all characters above 127 active, and map them to a suitable control sequence. But that only works with single-byte encodings.

the machinery is already there for quite some time; it's the way chinese/korean is implemented; and for western languages the mechanism is even simpler.

...

While I am writing this, I am beginning to think that copying xmltex's algorithm to context is the best way to go.

not sure about that; as said, context has the machinery already; i only need tables to work with (+the conversion stuff taco mailed earlier) Hans ------------------------------------------------------------------------- Hans Hagen | PRAGMA ADE | pragma@wxs.nl Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com ------------------------------------------------------------------------- information: http://www.pragma-ade.com/roadmap.pdf documentation: http://www.pragma-ade.com/showcase.pdf -------------------------------------------------------------------------

Hans Hagen

1:28 p.m.

New subject: DocBookInContext & multi-languages (newbie) / utf

At 08:18 PM 11/29/2002 +0100, Simon Pepping wrote: about utf8

...

I would like to know that too :-) I have not yet found the time to find out how Context deals with encodings. I only have a note that says that one should do \useXMLfilter [utf], and that I should have a look at the xtag-utf (which is input by the above command) or enco files.

I would hope that context develops generic input encoding support, so that I only have to scan the encoding value in the XML declaration, and input the appropriate encoding file.

sure, but for that i need input on how those vector look like; xtag-utf is a starting point, but i also need the second 'bank', so whoi can (1) provide me with proper test files (can be simple text with utf 8 text) (2) provide the mapping list in terms of <nr><nr> => \namedglyph later some kind of language dependency needs to be brought in [so, the framework is already there] Hans ------------------------------------------------------------------------- Hans Hagen | PRAGMA ADE | pragma@wxs.nl Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com ------------------------------------------------------------------------- information: http://www.pragma-ade.com/roadmap.pdf documentation: http://www.pragma-ade.com/showcase.pdf -------------------------------------------------------------------------

Gour

2:59 p.m.

New subject: DocBookInContext & multi-languages (newbie) / utf

Hans Hagen (pragma@wxs.nl) wrote:

...

sure, but for that i need input on how those vector look like; xtag-utf is a starting point, but i also need the second 'bank', so whoi can

Where can I get some more info how this xtag-utf vector should look like?

...

(1) provide me with proper test files (can be simple text with utf 8 text)

UTF-8_and_Unicode_FAQ has some test files and I'm sure this step is not a problem.

...

(2) provide the mapping list in terms of <nr><nr> => \namedglyph

I need some explanation: e.g. amacron (small latin letter "a" with macron") has Unicode code U+0101. When I look within Vim (g8 function) it shows me that it has "c4 81" hex value in UTF-8 encoding. In this way it's possible to get Unicode code & hex value in UTF-8 encoding. So, I'm interested, what would be the correct entry for the above mentioned list in the case for amacron: a) c4 81 -> amacron b) 0101 -> amacron or something else? Please, give me some more info and I'd glad to help since I'm sure that utf-8 support for ConTeXt is the right way to go. With package like DocBookInConTeXt one can author directly in XML and have all the advantages of using standard DTD, then one can map the document in ConTeXt and take advantage of its capabilities, and get the superb TeX quality output. With the utf-8 support, to get the westernized transliteration of Sanskrit mentioned in other thread, is piece of cake.

...

later some kind of language dependency needs to be brought in

Sure.

...

[so, the framework is already there]

Nice to hear that. Let's move forward :-) Sincerely, Gour -- Gour gour@mail.inet.hr Registered Linux User #278493

Hans Hagen

3:43 p.m.

New subject: DocBookInContext & multi-languages (newbie) / utf

At 02:59 PM 12/2/2002 +0100, Gour wrote:

...

Hans Hagen (pragma@wxs.nl) wrote:

...
sure, but for that i need input on how those vector look like; xtag-utf is a starting point, but i also need the second 'bank', so whoi can

Where can I get some more info how this xtag-utf vector should look like?

in xtag-utf.tex in .../tex/context/base (at least in my version and the beta)

...

...
(1) provide me with proper test files (can be simple text with utf 8 text)

UTF-8_and_Unicode_FAQ has some test files and I'm sure this step is not a problem.

So, where can i find that doc?

...

...
(2) provide the mapping list in terms of <nr><nr> => \namedglyph

I need some explanation: e.g. amacron (small latin letter "a" with macron") has Unicode code U+0101. When I look within Vim (g8 function) it shows me that it has "c4 81" hex value in UTF-8 encoding.

In this way it's possible to get Unicode code & hex value in UTF-8 encoding.

So, I'm interested, what would be the correct entry for the above mentioned list in the case for amacron:

a) c4 81 -> amacron b) 0101 -> amacron

or something else?

so, c4 is the trigger, and 81 the character; this means that the function attached to c4 has to map the 81 onto \amacron can you make me a file with a list like: amacron : 01/01 : c4/c8 : <utfcode> ^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^ normal ascii real utf Hans ------------------------------------------------------------------------- Hans Hagen | PRAGMA ADE | pragma@wxs.nl Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com ------------------------------------------------------------------------- information: http://www.pragma-ade.com/roadmap.pdf documentation: http://www.pragma-ade.com/showcase.pdf -------------------------------------------------------------------------

Taco Hoekwater

5:36 p.m.

New subject: DocBookInContext & multi-languages (newbie) / utf

UTF8 encoding is rather simple, really: byte number: b1 b2 b3 b4 0 -- 127 = unicode 0x00 - 0x7F 192 -- 223 128 -- 191 = unicode 0x80 - 0x7FF 224 -- 239 128 -- 191 128 -- 191 = unicode 0x800 - 0xFFFF 240 -- 247 128 -- 191 128 -- 191 128 -- 191 = unicode 0x10000 - 0x1FFFF There are also sequences for 5 and 6 bytes, but these are illegal for Unicode representations at the moment: 248 -- 251 128 -- 191 128 -- 191 128 -- 191 128 -- 191 252 -- 253 128 -- 191 128 -- 191 128 -- 191 128 -- 191 128 -- 191 128 -- 191 are illegal as first chars in UTF8 (that is handy for error-recovery): 254 and 255 are completely illegal and should not appear at all (if you see them, it's a safe bet that the document is encoded as UTF16, not UTF8): The unicode number for a UTF8 sequence can be calculated as: byte1 if byte1 <= 127 (byte1-192)*64 + (byte2-128) if 192 <= byte1 <= 223 (byte1-224)*4096 + (byte2-128)*64 + (byte3-128) if 224 <= byte1 <= 239 (byte3-240)*262144 + (byte2-128)*4096 + (byte3-128)*64 + (byte4-128) if 240<= byte1 <= 247 Simple, eh? -- groeten, Taco

Gour

6:40 p.m.

New subject: DocBookInContext & multi-languages (newbie) / utf

Hans Hagen (pragma@wxs.nl) wrote:

...

in xtag-utf.tex in .../tex/context/base (at least in my version and the beta)

On my SuSE 8.0 I didn't find it, but fortunately it's in the beta which I downloaded :-) So here I see something like: \defineUTFcharacter amacron 1 1 which corresponds to the Unicode code of amacron: U+0101 and it's according to the output of Vim's function: "ga" which shows: <ā> 257, Hex 0101, Octal 401. Now, it just a question of little work to slowly populate this vector with the values for different Unicode characters.

...

...
UTF-8_and_Unicode_FAQ has some test files and I'm sure this step is not a problem.

So, where can i find that doc?

The FAQ document is at: http://www.cl.cam.ac.uk/~mgk25/unicode.html, and the example files are under: http://www.cl.cam.ac.uk/~mgk25/unicode.html#examples Pls. take a look http://www.macchiato.com/unicode/Unicode_transcriptions.html under the example's list. There is also Unicode converter: http://www.macchiato.com/unicode/convert.html

...

...
a) c4 81 -> amacron b) 0101 -> amacron

so, c4 is the trigger, and 81 the character; this means that the function attached to c4 has to map the 81 onto \amacron

I'm not sure whether c4 is the trigger for the 81 character. c4 81 is two-byte representation in memory (that's what you'll see in some hexadecimal editor) of Unicode amacron character with the code U+0101, or simply said: utf-8 code for amacron :-)

...

can you make me a file with a list like:

amacron : 01/01 : c4/c8 : <utfcode> ^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^ normal ascii real utf

So, the line for amacron should look like: amacron : 01/01 c4/c8 since c4/c8 is utfcode for amacron. Is this OK? -- Gour gour@mail.inet.hr Registered Linux User #278493

Simon Pepping

9:16 p.m.

New subject: DocBookInContext & multi-languages (newbie) / utf

Hmm, I just wrote my email before fetching the emails that had been exchanged over the weekend. Not a good idea. On Mon, Dec 02, 2002 at 06:40:30PM +0100, Gour wrote:

...

Hans Hagen (pragma@wxs.nl) wrote:

...
...
a) c4 81 -> amacron b) 0101 -> amacron

so, c4 is the trigger, and 81 the character; this means that the function attached to c4 has to map the 81 onto \amacron

I'm not sure whether c4 is the trigger for the 81 character.

c4 81 is two-byte representation in memory (that's what you'll see in some hexadecimal editor) of Unicode amacron character with the code U+0101, or simply said: utf-8 code for amacron :-)

...
can you make me a file with a list like:

amacron : 01/01 : c4/c8 : <utfcode> ^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^ normal ascii real utf

So, the line for amacron should look like:

amacron : 01/01 c4/c8

since c4/c8 is utfcode for amacron.

Is this OK?

I do not think the mapping files should touch utf-8. The input mechanism should map utf-8 to unicode, and then the mapping should map unicode to a macro. In that way the same mapping can be used by other encodings, provided they have an input mapping to unicode. Simon -- Simon Pepping email: spepping@scaprea.hobby.nl

Hans Hagen

10:57 p.m.

New subject: DocBookInContext & multi-languages (newbie) / utf

At 09:16 PM 12/2/2002 +0100, you wrote:

...

I do not think the mapping files should touch utf-8. The input mechanism should map utf-8 to unicode, and then the mapping should map unicode to a macro. In that way the same mapping can be used by other encodings, provided they have an input mapping to unicode.

actually, utf maps onto context internal named glyphs Hans ------------------------------------------------------------------------- Hans Hagen | PRAGMA ADE | pragma@wxs.nl Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com ------------------------------------------------------------------------- information: http://www.pragma-ade.com/roadmap.pdf documentation: http://www.pragma-ade.com/showcase.pdf -------------------------------------------------------------------------

Simon Pepping

3 Dec 3 Dec

9:03 p.m.

New subject: DocBookInContext & multi-languages (newbie) / utf

On Mon, Dec 02, 2002 at 10:57:19PM +0100, Hans Hagen wrote:

...

At 09:16 PM 12/2/2002 +0100, you wrote:

actually, utf maps onto context internal named glyphs

I had a brief look into xtag-utf, and the xtag-me? and xtag-mx? modules. I totally understimated how much you have already done in this area. How does one get an internal name for say a Devanagari symbol? It should somehow refer to a font or a font encoding that contains such a symbol. Should the font encoding define such internal names, and map them to the glyph indices in the font? In another mail you refer to Chinese and korean support; where can that be seen? Regards, Simon -- Simon Pepping email: spepping@scaprea.hobby.nl

Hans Hagen

4 Dec 4 Dec

12:31 a.m.

New subject: DocBookInContext & multi-languages (newbie) / utf

At 09:03 PM 12/3/2002 +0100, you wrote:

...

On Mon, Dec 02, 2002 at 10:57:19PM +0100, Hans Hagen wrote:

...
At 09:16 PM 12/2/2002 +0100, you wrote:

actually, utf maps onto context internal named glyphs

I had a brief look into xtag-utf, and the xtag-me? and xtag-mx? modules. I totally understimated how much you have already done in this area.

I'll post the updated utf handler asap; documenting it now

...

How does one get an internal name for say a Devanagari symbol? It should somehow refer to a font or a font encoding that contains such a symbol. Should the font encoding define such internal names, and map them to the glyph indices in the font?

names are best; for languages like chinese things are slightly more complicates because there the handler (several encodings are supported there) must take care of inter character breaking as well

...

In another mail you refer to Chinese and korean support; where can that be seen?

chinese is described in a manual at our site (follow showcase -> manuals); context documentation is being translated into chinese as well korean is currently being implemented by Cho Jin-Hwan and Wang Lei (also supporting an extended version of dvipdfm which does unicode; quite nice to see chinese in widgets) Hans ------------------------------------------------------------------------- Hans Hagen | PRAGMA ADE | pragma@wxs.nl Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com ------------------------------------------------------------------------- information: http://www.pragma-ade.com/roadmap.pdf documentation: http://www.pragma-ade.com/showcase.pdf -------------------------------------------------------------------------

Gour

3:10 p.m.

New subject: DocBookInContext & multi-languages (newbie) / utf

Hans Hagen (pragma@wxs.nl) wrote:

...

names are best; for languages like chinese things are slightly more complicates because there the handler (several encodings are supported there) must take care of inter character breaking as well

I've just looked briefly on two Devanagari Unicode fonts, and e.g. Devanagari letter A with the code U+0905, is named "glyph92". Other characters just follow the pattern. Sincerely, Gour -- Gour gour@mail.inet.hr Registered Linux User #278493

Hans Hagen

5:31 p.m.

New subject: DocBookInContext & multi-languages (newbie) / utf

At 03:10 PM 12/4/2002 +0100, you wrote:

...

Hans Hagen (pragma@wxs.nl) wrote:

...
names are best; for languages like chinese things are slightly more complicates because there the handler (several encodings are supported there) must take care of inter character breaking as well

I've just looked briefly on two Devanagari Unicode fonts, and e.g. Devanagari letter A with the code U+0905, is named "glyph92". Other characters just follow the pattern.

hm, in the thousands-of-glyphs test doc that i use i see that they do have proper names); what is a good type1 font for testing? we need - some demo utf input - a font with the glyphs - a suitable map/encoding file for pdftex Hans ------------------------------------------------------------------------- Hans Hagen | PRAGMA ADE | pragma@wxs.nl Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com ------------------------------------------------------------------------- information: http://www.pragma-ade.com/roadmap.pdf documentation: http://www.pragma-ade.com/showcase.pdf -------------------------------------------------------------------------

Gour

9:08 p.m.

New subject: DocBookInContext & multi-languages (newbie) / utf

Hans Hagen (pragma@wxs.nl) wrote:

...

hm, in the thousands-of-glyphs test doc that i use i see that they do have proper names); what is a good type1 font for testing?

I only found few ttf fonts. If they can be used/transformed, I can send them. Here is the url for Titus font created in cooperation with Bitstream Inc.: http://titus.uni-frankfurt.de/unicode/unitest2.htm#TITUUT

...

we need

- some demo utf input

Maybe Richard can supply something.

...

- a font with the glyphs

Pls. see above mentioned font. Sincerely, Gour -- Gour gour@mail.inet.hr Registered Linux User #278493

Richard Mahoney

5 Dec 5 Dec

1:10 a.m.

New subject: multi-languages [UTF-8 Roman and UTF-8 Nagari test files]

On Wed, Dec 04, 2002 at 09:08:47PM +0100, Gour wrote:

...

Hans Hagen (pragma@wxs.nl) wrote:

...

...
we need

- some demo utf input

Maybe Richard can supply something.

Well you asked for it ;-) I've uploaded the following. Do with them as you see fit ... 1385 Dec 4 23:08 Skt_CSXp.tex 1436 Dec 4 23:08 Skt_UTF-8.tex 18227 Dec 4 23:56 Skt_UTF-8_Nagari_BCA_1_1.png 311 Dec 4 23:52 Skt_UTF-8_Nagari_BCA_1_1.txt 22786 Dec 4 23:55 Skt_UTF-8_Roman_BCA_1_1.png 161 Dec 4 23:52 Skt_UTF-8_Roman_BCA_1_1.txt Each can be accessed at: http://homepages.comnet.co.nz/~r-mahoney/sundries/file_name The PNGs show their respective files in `yudit'. I can't comment on IE, but the Nagari and Roman translit in these text files displays without issue with Netscape7 under FreeBSD 4.7-STABLE. Your mileage may vary. Regards, Richard P.S. Please say if you need anything else. -- Richard Mahoney | E-mail: rbm49@ext.canterbury.ac.nz 78 Jeffreys Road | r.mahoney@comnet.net.nz Fendalton | Telephone: 0064-3-351-5831 CHRISTCHURCH 8005 | Cellular: 0064-25-829-986 NEW ZEALAND | http://homepages.comnet.co.nz/~r-mahoney

Hans Hagen

12:58 p.m.

New subject: DocBookInContext & multi-languages (newbie) / utf

At 09:08 PM 12/4/2002 +0100, you wrote:

...

Hans Hagen (pragma@wxs.nl) wrote:

...
hm, in the thousands-of-glyphs test doc that i use i see that they do have proper names); what is a good type1 font for testing?

I only found few ttf fonts. If they can be used/transformed, I can send them.

Here is the url for Titus font created in cooperation with Bitstream Inc.:

http://titus.uni-frankfurt.de/unicode/unitest2.htm#TITUUT

...
we need

- some demo utf input

Maybe Richard can supply something.

...
- a font with the glyphs

Pls. see above mentioned font.

what we need for that font is a series of proper tfm files; do you know of any progress in that direction keep in mind that in order for utf to work, we need to switch fonts Hans ------------------------------------------------------------------------- Hans Hagen | PRAGMA ADE | pragma@wxs.nl Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com ------------------------------------------------------------------------- information: http://www.pragma-ade.com/roadmap.pdf documentation: http://www.pragma-ade.com/showcase.pdf -------------------------------------------------------------------------

Taco Hoekwater

1:22 p.m.

New subject: DocBookInContext & multi-languages (newbie) / utf

On Thu, 05 Dec 2002 12:58:59 +0100, Hans wrote:

...

...
Pls. see above mentioned font.

what we need for that font is a series of proper tfm files; do you know of any progress in that direction

keep in mind that in order for utf to work, we need to switch fonts

So that's what happened to Cyberbit :) What is a "proper tfm" in this context? A collection of TFMs that are dumps of unicode hex blocks is easy to create. -- groeten, Taco

Hans Hagen

2:25 p.m.

New subject: DocBookInContext & multi-languages (newbie) / utf

At 01:22 PM 12/5/2002 +0100, Taco Hoekwater wrote:

...

On Thu, 05 Dec 2002 12:58:59 +0100, Hans wrote:

...
...
Pls. see above mentioned font.

what we need for that font is a series of proper tfm files; do you know of any progress in that direction

keep in mind that in order for utf to work, we need to switch fonts

So that's what happened to Cyberbit :)

What is a "proper tfm" in this context? A collection of TFMs that are dumps of unicode hex blocks is easy to create.

indeed, just found out how to do that; i think that we just need a series of enc files like: /Unicode0x09 [ /index0x0900/index0x0901/index0x0902/index0x0903 /index0x0904/index0x0905/index0x0906/index0x0907 /index0x0908/index0x0909/index0x090A/index0x090B /index0x090C/index0x090D/index0x090E/index0x090F /index0x0910/index0x0911/index0x0912/index0x0913 so: unifont.ttf -> afm : unifont-0x09.afm tfm : unifont-0x09.tfm enc : range0x09.enc map : appropriate entry am i right? Hans ------------------------------------------------------------------------- Hans Hagen | PRAGMA ADE | pragma@wxs.nl Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com ------------------------------------------------------------------------- information: http://www.pragma-ade.com/roadmap.pdf documentation: http://www.pragma-ade.com/showcase.pdf -------------------------------------------------------------------------

Tobias Burnus

3:03 p.m.

New subject: DocBookInContext & multi-languages (newbie) / utf

Hi, On Thu, 5 Dec 2002, Hans Hagen wrote:

...

...
...
- some demo utf input http://www.cl.cam.ac.uk/~mgk25/unicode.html#examples also one line above.

...

...
...
- a font with the glyphs There is also a nice shareware font Code2000 (http://home.att.net/~jameskass/) which contains a lot of characters, but only in normal (not italic/bold ...)

Addionally there are: http://bibliofile.mc.duke.edu/gww/fonts/Unicode.html http://www.hclrss.demon.co.uk/unicode/fontsbyrange.html Maybe http://bibliofile.mc.duke.edu/gww/fonts/Unicode.html is best since it is both free and contains also italic, SC etc. Tobias

Richard Mahoney

8:09 p.m.

New subject: Create Type 1 fonts with Indological diacritics and UTF-8 TTF

On Thu, Dec 05, 2002 at 12:58:59PM +0100, Hans Hagen wrote:

...

...
I only found few ttf fonts. If they can be used/transformed, I can send them.

Here is the url for Titus font created in cooperation with Bitstream Inc.:

http://titus.uni-frankfurt.de/unicode/unitest2.htm#TITUUT

For a list of UTF-8 TrueType fonts with Indological diacritics see Andrew Glass's latest post to INDOLOGY: http://listserv.liv.ac.uk/archives/indology.html Date: Thu, 5 Dec 2002 08:04:01 -0800 Reply-To: Indology Sender: Indology From: Andrew Glass Organization: University of Washington Subject: Re: Additional formats for e-texts in GRETIL A search of the archives under `unicode' may also be worthwhile.

...

what we need for that font is a series of proper tfm files; do you know of any progress in that direction

keep in mind that in order for utf to work, we need to switch fonts

In creating a Type 1 fonts with Indological diacritics would `mkt1font' and `vpl2vpl' be helpful? See: ftp://bombay.oriental.cam.ac.uk/pub/john/software/programs/accfonts/README Regards, Richard Mahoney -- Richard Mahoney | E-mail: rbm49@ext.canterbury.ac.nz 78 Jeffreys Road | r.mahoney@comnet.net.nz Fendalton | Telephone: 0064-3-351-5831 CHRISTCHURCH 8005 | Cellular: 0064-25-829-986 NEW ZEALAND | http://homepages.comnet.co.nz/~r-mahoney

Hans Hagen

6 Dec 6 Dec

3:10 p.m.

New subject: Create Type 1 fonts with Indological diacritics and UTF-8 TTF

At 08:09 AM 12/6/2002 +1300, Richard Mahoney wrote:

...

On Thu, Dec 05, 2002 at 12:58:59PM +0100, Hans Hagen wrote:

...
...
I only found few ttf fonts. If they can be used/transformed, I can send them.

Here is the url for Titus font created in cooperation with Bitstream Inc.:

http://titus.uni-frankfurt.de/unicode/unitest2.htm#TITUUT

btw, quite painful that this font has funny glyph names Hans ------------------------------------------------------------------------- Hans Hagen | PRAGMA ADE | pragma@wxs.nl Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com ------------------------------------------------------------------------- information: http://www.pragma-ade.com/roadmap.pdf documentation: http://www.pragma-ade.com/showcase.pdf -------------------------------------------------------------------------

Michael Hallgren

4:22 p.m.

New subject: Docu set

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, what wget knobs and file should I currently, ideally use for polling the full set of documentation and other off the Pragma server? Cheers, mh - -- Michael Hallgren, http://m.hallgren.free.fr/, mh2198-ripe -----BEGIN PGP SIGNATURE----- Version: PGP 8.0 (Build 349) Beta iQA/AwUBPfDAmCsEKmyTmvZLEQL4NACcDf/sKtYCEimYDvyuf9iJhCqEL38Anjur HnJ2MfktPjbLXAan/IcYP7OX =BMP/ -----END PGP SIGNATURE-----

Patrick Gundlach

7 Dec 7 Dec

3:12 p.m.

New subject: Docu set

"Michael Hallgren" writes: Hi,

...

what wget knobs and file should I currently, ideally use for polling the full set of documentation and other off the Pragma server?

get the file: http://www.pragma-ade.com/context.www and type wget -Nxi context.www for an automatic download. It seems as if this is not 100% up to date, but almost. Patrick

Michael Hallgren

6:37 p.m.

New subject: Docu set

...

Hi,

...
what wget knobs and file should I currently, ideally use for polling the full set of documentation and other off the Pragma server?

get the file:

http://www.pragma-ade.com/context.www

Thanks, that was the file name I had forgotten.

...

and type wget -Nxi context.www for an automatic download. It seems as if this is not 100% up to date, but almost.

...

Patrick _______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context

Gour

6 Dec 6 Dec

4:36 p.m.

New subject: Create Type 1 fonts with Indological diacritics and UTF-8 TTF

Hans Hagen (pragma@wxs.nl) wrote:

...

btw, quite painful that this font has funny glyph names

What is the general proposal how to solve the issue with the glyph names? For the lower parts of the font, probably there is some standard, but also one can expect problems with the upper parts. Sincerely, Gour -- Gour gour@mail.inet.hr Registered Linux User #278493

Hans Hagen

5:47 p.m.

New subject: Create Type 1 fonts with Indological diacritics and UTF-8 TTF

At 04:36 PM 12/6/2002 +0100, you wrote:

...

Hans Hagen (pragma@wxs.nl) wrote:

...
btw, quite painful that this font has funny glyph names

What is the general proposal how to solve the issue with the glyph names?

For the lower parts of the font, probably there is some standard, but also one can expect problems with the upper parts.

i'm trying something: unicode0x09.enc with entries like /.c0x0012 and alike but ttf2afm and/or ttf2tfm somehow don't see things in the same way and/or skip ranges and/or mess up things [i spent a good deal searching for scripts and apps and making a few perl script and slowly getting depressed now] Hans ------------------------------------------------------------------------- Hans Hagen | PRAGMA ADE | pragma@wxs.nl Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com ------------------------------------------------------------------------- information: http://www.pragma-ade.com/roadmap.pdf documentation: http://www.pragma-ade.com/showcase.pdf -------------------------------------------------------------------------

Richard Mahoney

3 Dec 3 Dec

8:14 p.m.

New subject: DocBookInContext ... [CSX+, UTF-8 Roman, and Nagari Codings]

On Mon, Dec 02, 2002 at 06:40:30PM +0100, Gour wrote:

...

So here I see something like:

\defineUTFcharacter amacron 1 1

which corresponds to the Unicode code of amacron: U+0101 and it's according to the output of Vim's function: "ga" which shows:

<ā> 257, Hex 0101, Octal 401.

Now, it just a question of little work to slowly populate this vector with the values for different Unicode characters.

To save yourself time you could look at two C programmes that indicate CSXp, UTF-8 Roman, and UTF-8 Devanagari codings: `csxp2ur' -- converts CSXp --> UTF-8 Roman `ur2ud.c' -- converts UTF-8 Roman --> UTF-8 Devanagari Both are from: ftp://bombay.oriental.cam.ac.uk/pub/john/software/programs/ Regards, Richard -- Richard Mahoney | E-mail: rbm49@ext.canterbury.ac.nz 78 Jeffreys Road | r.mahoney@comnet.net.nz Fendalton | Telephone: 0064-3-351-5831 CHRISTCHURCH 8005 | Cellular: 0064-25-829-986 NEW ZEALAND | http://homepages.comnet.co.nz/~r-mahoney

Gour

4 Dec 4 Dec

3:16 p.m.

New subject: DocBookInContext ... [CSX+, UTF-8 Roman, and Nagari Codings]

Richard Mahoney (rbm49@ext.canterbury.ac.nz) wrote:

...

To save yourself time you could look at two C programmes that indicate CSXp, UTF-8 Roman, and UTF-8 Devanagari codings:

Thank you for that, Richard. However, since I'm not so familiar with Devanagari script, I thought to just provide the part of the utf-8 vector for the western transliteration characters. (it's according to the list you provided on the URL bca*.html) There is around 30 characters and I have defined them all for entering within Vim as well as in X via Compose key in epcEdit. To provide this part of the Unicode, would already cover needs od some users :-) Sincerely, Gour -- Gour gour@mail.inet.hr Registered Linux User #278493

8240

Age (days ago)

8248

Last active (days ago)

List overview

Download

33 comments

9 participants

participants (9)

Bruce D'Arcus
Gour
Hans Hagen
Michael Hallgren
Patrick Gundlach
Richard Mahoney
Simon Pepping
Taco Hoekwater
Tobias Burnus