Some Ethiopic examples (hyphenation/breaking) in ConTeXt

Mojca Miklavec

6 May 2011 6 May '11

10 p.m.

Dear Hans, We were originally preparing the example for XeTeX (which behaves very weird anyway) and I would like to know how to typeset Ethiopic text in ConTeXt. The basic requirements are: - Words may be split after any character (character = syllable; it's in the range "1200-"139F), but not before word/sentence dividers. (We have hyphenation patterns, but one could just as well use some other mechanism to break.) - "1361 and "1362 are word dividers and sentence dividers. - One doesn't use spaces when writing. - In output one should get something like space (approximately the same width) before and something like space after word/sentence divider, except that the "space" before divider should not be breakable; I highly suspect that the amount of space before/after dividers depends on the font being used, but I may be wrong. - Text should be nicely justified (I wonder if microtypography would also help). I'm attaching a sample text that does approximately what I expect it to do, but I would like to avoid active characters, make the space before and after divider of equal size and I'm not sure what is the most appropriate approach in ConTeXt. The example also leaves a bit too much whitespace after dividers that end the line. Here's the font used in the example: http://scripts.sil.org/AbyssinicaSIL_Download Thanks a lot, Mojca PS: In char-def.lua see [0x1361]={ category="po", description="ETHIOPIC WORDSPACE", direction="l", linebreak="ba", unicodeslot=0x1361, }, where linebreak="ba" means "break after" or "allow break after this character". But I guess that ConTeXt ignores those meanings at the moment.

Attachments:

context-geez.tex (application/x-tex — 1.6 KB)
lang-mul-ethi.lua (application/octet-stream — 237 bytes)
lang-mul-ethi.pat (application/octet-stream — 2.5 KB)

Show replies by date

Hans Hagen

7 May 7 May

1:20 p.m.

New subject: Some Ethiopic examples (hyphenation/breaking) in ConTeXt

On 6-5-2011 10:00, Mojca Miklavec wrote:

...

We were originally preparing the example for XeTeX (which behaves very weird anyway) and I would like to know how to typeset Ethiopic text in ConTeXt.

Let's forget about xetex then. It's not that complex to add to mkiv as we have mechanisms in place for it. What is the otf language / script code?

...

The basic requirements are:

- Words may be split after any character (character = syllable; it's in the range "1200-"139F), but not before word/sentence dividers. (We have hyphenation patterns, but one could just as well use some other mechanism to break.)

- "1361 and "1362 are word dividers and sentence dividers.

- One doesn't use spaces when writing.

Like in cjk.

...

- In output one should get something like space (approximately the same width) before and something like space after word/sentence divider, except that the "space" before divider should not be breakable; I highly suspect that the amount of space before/after dividers depends on the font being used, but I may be wrong.

so let's visualize that: [1200][1200][1200][1361][1200][1200][1200][1362][1200][1200][1200] valid breakpoints: [1200] [1200] [1200][nbsp][1200] [1200] [1200][nbsp][1200] [1200] [1200] Is that okay? How about spaces in the input (end of lines introduce them)?

...

- Text should be nicely justified (I wonder if microtypography would also help).

That is independent of the logic.

...

I'm attaching a sample text that does approximately what I expect it to do, but I would like to avoid active characters, make the space before and after divider of equal size and I'm not sure what is the most appropriate approach in ConTeXt. The example also leaves a bit too much whitespace after dividers that end the line.

Nothing attached. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Mojca Miklavec

4:19 p.m.

On Sat, May 7, 2011 at 13:20, Hans Hagen wrote:

...

On 6-5-2011 10:00, Mojca Miklavec wrote:

What is the otf language / script code?

Ethi for script and AMH for language (but language should probably not be needed).

...

...
- In output one should get something like space (approximately the same width) before and something like space after word/sentence divider, except that the "space" before divider should not be breakable; I highly suspect that the amount of space before/after dividers depends on the font being used, but I may be wrong.

so let's visualize that:

[1200][1200][1200][1361][1200][1200][1200][1362][1200][1200][1200]

valid breakpoints:

[1200] [1200] [1200][nbsp][1200] [1200] [1200][nbsp][1200] [1200] [1200]

Is that okay?

No, it should be: [1200] [1200] [1200][nbsp][1361] [1200] [1200] [1200][nbsp][1362] [1200] [1200] [1200] Word delimiters should be displayed.

...

How about spaces in the input (end of lines introduce them)?

Adam? My guess would be that they might not use end-of-lines except when they want to start a new paragraph, but I may as well be wrong. If there are end-of-lines, they should probably be ignored - no extra space should be introduced (unless there are two, so that a new paragraph is started). But Adam should correct me. In fact there are two different writing paradigms. One uses word separator and another one uses spaces. My guess that the second one might have arised in the modern era due to poor computer suppor. (If they are using spaces, they have at least a chance that words break in text editors and web browsers, but I may be wrong. Wikipedia uses spaces for example, but all old books use separators.) Anyway: in case that one uses the second paradigm (use spaces instead of word separators), the end of line should be treated as a normal space and writing should be no different than for any other European language in Latin script.

...

Nothing attached.

There was an attachment originally (see http://article.gmane.org/gmane.comp.tex.context/68230), but maybe your spam filter didn't like the Ethiopic spam. (My roommate was just robbed/scammed in Ethiopia last week; no wonder that even spam filters put the mails in the same category as Nigerian scams :) Mojca

Arthur Reutenauer

5:59 p.m.

...

Ethi for script and AMH for language (but language should probably not be needed).

Indeed, as the same behaviour can be expected for several different languages using the same script. Arthur

Hans Hagen

8 May 8 May

7:43 p.m.

New subject: Some Ethiopic examples (hyphenation/breaking) in ConTeXt

On 7-5-2011 4:19, Mojca Miklavec wrote:

...

In fact there are two different writing paradigms. One uses word separator and another one uses spaces. My guess that the second one might have arised in the modern era due to poor computer suppor. (If they are using spaces, they have at least a chance that words break in text editors and web browsers, but I may be wrong. Wikipedia uses spaces for example, but all old books use separators.)

So what are the rules for mixing languages/scripts then? [ethi] [latn] [ethi]

...

(My roommate was just robbed/scammed in Ethiopia last week; no wonder that even spam filters put the mails in the same category as Nigerian scams :)

I recently installed language blocking to the routers ... at some point I think it will become 'block all' unless 'a few countries'. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Arthur Reutenauer

8:13 p.m.

...

So what are the rules for mixing languages/scripts then?

I'm not sure there are any rules. There might be typographical traditions, though.

...

I recently installed language blocking to the routers ... at some point I think it will become 'block all' unless 'a few countries'.

There are scammers in every country, unfortunately. In London there is a well-known scam when you are looking for a flat to rent; adverts that look like genuine offers at first turn out to be just a trick by scammers to rip you off of your money. I made contact with some of them when I first moved here, but stopped talking to them very soon because what they told me didn't make any sense. I feel uncomfortable when things don't make sense :-) Arthur

5179

Age (days ago)

5181

Last active (days ago)

List overview

Download

5 comments

3 participants

participants (3)

Arthur Reutenauer
Hans Hagen
Mojca Miklavec