can Context render complex scripts?

Michael Saunders

11 Jun 2010 11 Jun '10

9:14 a.m.

My first experiments aren't going well. For example: using the free font, BNBDOT0N.ttf, from Deutsche Welle here: http://www.dw-world.de/dw/article/0,,3219221,00.html and the following typescript, "type-bidisha.tex": \starttypescript [serif] [dwbangla] \definefontsynonym[DWbangla][name:BNBIDISHAOpentypeNormal][features=body] \stoptypescript \starttypescript [serif] [dwbangla] \definefontsynonym[Serif][DWbangla][features=body] \stoptypescript \starttypescript [dwbangla] \definetypeface [dwbangla] [rm] [serif] [dwbangla] [default] [script=beng,features=body] \stoptypescript I try the following test: \definefontfeature[default][mode=node,language=dflt,script=latn,kern=yes,liga=yes,tlig=yes,trep=yes] \definefontfeature[body][default][mode=node,script=latn,onum=yes,pnum=yes,calt=yes,protrusion=quality,expansion=quality] %just to be sure: \definefontfeature[indic][body][nukt=yes,akhn=yes,rphf=yes,blwf=yes,half=yes,pstf=yes,vatu=yes,pres=yes,blws=yes,abvs=yes,psts=yes, haln=yes,blwm=yes,abvm=yes,dist=yes] \usetypescriptfile[type-bidisha] \starttypescript [MTbook] \definetypeface [dwbangla] [rm] [serif] [dwbangla] [default] [script=beng,language=ben,features=body] \stoptypescript \def\bengali#1{{\switchtobodyfont[dwbangla]\addff{indic}\language[ben]#1}} \usetypescript[MTbook] \starttext \bengali{সত্যজিৎ রায়} \stoptext "সত্যজিৎ" isn't rendered correctly in the output---after a the first two characters, things go wrong. Yet, with the same font, it is rendered correctly everywhere else I look in windows---notepad, Firefox, TexnicCenter, etc., etc. To see what a correct rendering should look like, google "সত্যজিৎ রায়" or see here: http://bn.wikipedia.org/wiki/%E0%A6%B8%E0%A6%A4%E0%A7%8D%E0%A6%AF%E0%A6%9C%E... or here (first word in text, in bold): http://i367.photobucket.com/albums/oo113/andbipul/All%20about%20JJ/JJ%20Torj... I tried this with several other free and MS fonts (e.g., arial Unicode MS) and got the same results. Am I doing something wrong?

Show replies by date

Khaled Hosny

11 Jun 11 Jun

9:49 a.m.

On Fri, Jun 11, 2010 at 02:14:12AM -0500, Michael Saunders wrote:

...

My first experiments aren't going well. For example:

AFAIK, there no Indic specific support yet, Hans will probably need more information about Indic shaping, some test files, the expected output etc. Also, AFAIK, Indic support in OpenType is a bit messy, as there are two specification by MS, one deprecating the other, and there are fonts in the wild that implement this or that, there isn't any free implementation of the new specification, BTW. Digging MS typography site for more information would be a good start, I guess. -- Khaled Hosny Arabic localiser and member of Arabeyes.org team Free font developer

Hans Hagen

13 Jun 13 Jun

4:59 p.m.

On 11-6-2010 9:14, Michael Saunders wrote:

...

My first experiments aren't going well. For example:

using the free font, BNBDOT0N.ttf, from Deutsche Welle here: http://www.dw-world.de/dw/article/0,,3219221,00.html

next time make a simple example .. you caniidentity features with mtxrun --script font --info --list --file BNBDOT0N.ttf anyhow, only one features is applied. It lookslike some gpos feature is not used. \usemodule[fnt-20] \definefontfeature [indic] [mode=node,analyze=yes, script=beng,language=dflt, % gsub abvs=yes,akhn=yes,blwf=yes,blws=yes, half=yes,nukt=yes,pstf=yes,psts=yes,rphf=yes, % gpos blwm=yes,abvm=yes] \definefontsynonym[dwbangla][file:BNBDOT0N.ttf] \starttext {\definedfont[dwbangla*indic] সত্যজিৎ রায় \par} \showotfcomposition {dwbangla*indic} {0} {সত্যজিৎ রায়} \stoptext ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Michael Saunders

6:58 p.m.

...

next time make a simple example .. you caniidentity features with

mtxrun --script font --info --list --file BNBDOT0N.ttf

anyhow, only one features is applied. It lookslike some gpos feature is not used.

\usemodule[fnt-20]

\definefontfeature [indic] [mode=node,analyze=yes, script=beng,language=dflt, % gsub abvs=yes,akhn=yes,blwf=yes,blws=yes, half=yes,nukt=yes,pstf=yes,psts=yes,rphf=yes, % gpos blwm=yes,abvm=yes]

\definefontsynonym[dwbangla][file:BNBDOT0N.ttf]

\starttext

{\definedfont[dwbangla*indic] সত্যজিৎ রায় \par}

\showotfcomposition {dwbangla*indic} {0} {সত্যজিৎ রায়}

\stoptext

I don't understand---are you saying this is supposed to work? Is the trick supposed to be using fnt-20 or in being careful not to turn on unused features? (To be on the safe side, since I was switching between them in testing, I was turning on all the features Microsoft calls Indic.) I tried your example and, yes, one shaping looks correct, but there was no reordering and now some of the characters print out on top of each other (which is incorrect). I tried the same routine, using fnt-20 and commenting out unused features, for some other fonts: Akaash: http://www.nongnu.org/freebangfont/downloads.html Bangla, from the University of Chicago: http://salrc.uchicago.edu/resources/fonts/available/bengali/ and Arial Unicode MS (standard on Windows), and I got no improvement in their rendering in Context. They work fine in Notepad, Firefox, and TexnicCenter though, for example. What am I missing?

Hans Hagen

9:07 p.m.

On 13-6-2010 6:58, Michael Saunders wrote:

...

...
next time make a simple example .. you caniidentity features with

mtxrun --script font --info --list --file BNBDOT0N.ttf

anyhow, only one features is applied. It lookslike some gpos feature is not used.

\usemodule[fnt-20]

\definefontfeature [indic] [mode=node,analyze=yes, script=beng,language=dflt, % gsub abvs=yes,akhn=yes,blwf=yes,blws=yes, half=yes,nukt=yes,pstf=yes,psts=yes,rphf=yes, % gpos blwm=yes,abvm=yes]

\definefontsynonym[dwbangla][file:BNBDOT0N.ttf]

\starttext

{\definedfont[dwbangla*indic] সত্যজিৎ রায় \par}

\showotfcomposition {dwbangla*indic} {0} {সত্যজিৎ রায়}

\stoptext

I don't understand---are you saying this is supposed to work? Is the

well, i wrote: it looks like some pos feaure is not applied ...

...

trick supposed to be using fnt-20 or in being careful not to turn on unused features? (To be on the safe side, since I was switching

no, but turning them on makes tracing cumbersome

...

between them in testing, I was turning on all the features Microsoft calls Indic.) I tried your example and, yes, one shaping looks correct, but there was no reordering and now some of the characters print out on top of each other (which is incorrect). I tried the same routine, using fnt-20 and commenting out unused features, for some other fonts:

...

What am I missing?

As Khaled mentioned ... are these proper otf fonts or do they rely on specific features in the microsoft engine? Most opentype features are quite generic and should work ok but if something special is needed more info is needed. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Khaled Hosny

9:15 p.m.

On Sun, Jun 13, 2010 at 09:07:55PM +0200, Hans Hagen wrote:

...

As Khaled mentioned ... are these proper otf fonts or do they rely on specific features in the microsoft engine?

Most opentype features are quite generic and should work ok but if something special is needed more info is needed.

IIRC, there is a bit of engine level glyph reordering involved with Indic rendering. -- Khaled Hosny Arabic localiser and member of Arabeyes.org team Free font developer

Khaled Hosny

9:19 p.m.

On Sun, Jun 13, 2010 at 10:15:35PM +0300, Khaled Hosny wrote:

...

On Sun, Jun 13, 2010 at 09:07:55PM +0200, Hans Hagen wrote:

...
As Khaled mentioned ... are these proper otf fonts or do they rely on specific features in the microsoft engine?

Most opentype features are quite generic and should work ok but if something special is needed more info is needed.

IIRC, there is a bit of engine level glyph reordering involved with Indic rendering.

Here is a tugboat article with more details: http://www.tug.org/TUGboat/Articles/tb23-1/rajkumar.pdf Though it might not be up to date, but it gives a good overall idea about the issue. -- Khaled Hosny Arabic localiser and member of Arabeyes.org team Free font developer

Khaled Hosny

9:28 p.m.

On Sun, Jun 13, 2010 at 10:19:50PM +0300, Khaled Hosny wrote:

...

On Sun, Jun 13, 2010 at 10:15:35PM +0300, Khaled Hosny wrote:

...
On Sun, Jun 13, 2010 at 09:07:55PM +0200, Hans Hagen wrote:

...
As Khaled mentioned ... are these proper otf fonts or do they rely on specific features in the microsoft engine?

Most opentype features are quite generic and should work ok but if something special is needed more info is needed.

IIRC, there is a bit of engine level glyph reordering involved with Indic rendering.

Here is a tugboat article with more details: http://www.tug.org/TUGboat/Articles/tb23-1/rajkumar.pdf

Though it might not be up to date, but it gives a good overall idea about the issue.

And MS page: http://www.microsoft.com/typography/otfntdev/indicot/features.aspx -- Khaled Hosny Arabic localiser and member of Arabeyes.org team Free font developer

Hans Hagen

9:41 p.m.

On 13-6-2010 9:15, Khaled Hosny wrote:

...

On Sun, Jun 13, 2010 at 09:07:55PM +0200, Hans Hagen wrote:

...
As Khaled mentioned ... are these proper otf fonts or do they rely on specific features in the microsoft engine?

Most opentype features are quite generic and should work ok but if something special is needed more info is needed.

IIRC, there is a bit of engine level glyph reordering involved with Indic rendering.

i remember seeing that mentioned at some place but forgot the details the interesting question then is .. where does one draw the line between engine and clever fonts ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Idris Samawi Hamid ادريس سماوي حامد

10:07 p.m.

Hi, On Sun, 13 Jun 2010 13:41:38 -0600, Hans Hagen wrote:

...

the interesting question then is .. where does one draw the line between engine and clever fonts

contextual analysis in Arabic provides an example: init, medi, and fina provide the gsub's, the engine has to know the rules/conditions under which to apply each of those three. Eg, VOLT Proofing tool will NOT do contextual analysis, only implement the selected lookups. Something similar applies to Indic of course, but someone has to provide the framework of rules. The MS pages Khaled sent are a good place to start, and a user willing to make examples with explanations etc... That is, given all gsub and gpos routines needed by a given language, what minimal additional information does the engine need to know? In Arabic, one needs contextual analysis: if a letter occurs at the beginning of a string, choose the init gsub routine etc. The bad thing about Uniscribe is that it adds too much information and "spell-checking", restricting typesetting possibilities. In any case, someone interested has to research/provide the details for Indic. Best wishes Idris -- Professor Idris Samawi Hamid, Editor-in-Chief International Journal of Shi`i Studies Department of Philosophy Colorado State University Fort Collins, CO 80523

Michael Saunders

14 Jun 14 Jun

3:21 a.m.

...

OpenType (just ignore file extension for the moment) is a rather dump standard in the sense that it requires the engine to have some knowledge about the writing system at hand. ... So, what we have here is that ConTeXt has no special knowledge about Indic scripts, and thus it will not apply the feature properly according the linguistic rules.

It seems to me that, since Hans can't be expected to write special code for dozens of Indic scripts, let alone for every script in the world, the pragmatic solution would be a method for Context to harness external engines (ICU, Pango, Graphite, Uniscribe, or whatever). If that's not possible, or if he doesn't want to do it, or until he is able to work with scholars on each special case, programs like Notepad will be able to do something that Context can't. As a casual user, it's not an urgent need for me. I had read that Aleph functionality had been integrated into LuaTeX (though I never found anything very detailed about that) and thought that everything was okay. I assume that Luatex/Context or some future TeX will have this functionality someday, however it's implemented. I'll just keep watching.

Hans Hagen

9:57 a.m.

On 14-6-2010 3:21, Michael Saunders wrote:

...

...
OpenType (just ignore file extension for the moment) is a rather dump standard in the sense that it requires the engine to have some knowledge about the writing system at hand. .... So, what we have here is that ConTeXt has no special knowledge about Indic scripts, and thus it will not apply the feature properly according the linguistic rules.

It seems to me that, since Hans can't be expected to write special code for dozens of Indic scripts, let alone for every script in the world, the pragmatic solution would be a method for Context to harness external engines (ICU, Pango, Graphite, Uniscribe, or whatever). If that's not possible, or if he doesn't want to do it, or until he is able to work with scholars on each special case, programs like Notepad will be able to do something that Context can't. As a casual user, it's not an urgent need for me. I had read that Aleph functionality had been integrated into LuaTeX (though I never found anything very detailed about that) and thought that everything was okay. I assume that Luatex/Context or some future TeX will have this functionality someday, however it's implemented. I'll just keep watching.

if you want to use external engines ... use xetex instead of luatex the whole idea behind luatex is that we have a configureable and programmable engine ... doing some script is not so much an issue (and we can do it more flexible once we have the machinery in place) but information is needed ... also, as we want tex cum suis to be flexible on the one hand and stable on the other, depending on hidden or fuzzy features in en uniscribe engine might not be the best idea in the oriental tex project we've spend quite some time on high quality arabic and i'm sure that we could not have come this far if we had wrote the code from scratch (all this font technology is not as open as the name suggest) ... also, there's more in mkiv than shown so far (esp in that area) for those experiments Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Michael Saunders

13 Jun 13 Jun

9:29 p.m.

...

As Khaled mentioned ... are these proper otf fonts or do they rely on specific features in the microsoft engine?

They all carry the .ttf extender. Arial Unicode MS is clearly True Type. I've seen some of the free fonts widely described as Open Type, but they are all amateur products---maybe they don't comply with the standard properly. If True Type fonts won't work correctly without an external engine and Context can't harness an external engine, then it looks like I will just have to wait for a more modern font to come along. Thanks for the article, Khaled.

Khaled Hosny

9:50 p.m.

On Sun, Jun 13, 2010 at 02:29:40PM -0500, Michael Saunders wrote:

...

...
As Khaled mentioned ... are these proper otf fonts or do they rely on specific features in the microsoft engine?

They all carry the .ttf extender. Arial Unicode MS is clearly True Type. I've seen some of the free fonts widely described as Open Type, but they are all amateur products---maybe they don't comply with the standard properly.

If True Type fonts won't work correctly without an external engine and Context can't harness an external engine, then it looks like I will just have to wait for a more modern font to come along.

OpenType (just ignore file extension for the moment) is a rather dump standard in the sense that it requires the engine to have some knowledge about the writing system at hand. Take Arabic as an example, the engine need to know about the rules of Arabic shaping, what are letters are dual joining, what are right joining only etc. and then apply OpenType features conditionally on those characters, without that knowledge, no OpenType engine is capable for rendering Arabic, and the same is true for any other complex script. On the other hand, there is AAT and Graphite fonts, that are implemented in a more generic way, where the engine has no particular knowledge about any writing system and all the rules are embedded into the font. Though this seems a bit smarter, it makes the task of building a font very complex and tedious since the rules have to be implemented again and again in each font, also type designers are not programmers and asking them to do such complex tasks is impractical, for example you can count all AAT fonts that produced outside Apple, the makers of the technology, on one hand, now even Apple is moving to OpenType. So, what we have here is that ConTeXt has no special knowledge about Indic scripts, and thus it will not apply the feature properly according the linguistic rules. So, instead of waiting for a "more modern font to come along" (which unlikely to happen any time soon, giving how much the industry have invested into OpenType, and the apparent failure of AAT), just try to reach more people in the Indic community and come out with a clear specification and tests that Hans can implement. Regards, Khaled -- Khaled Hosny Arabic localiser and member of Arabeyes.org team Free font developer

Hans Hagen

10:08 p.m.

On 13-6-2010 9:50, Khaled Hosny wrote:

...

So, what we have here is that ConTeXt has no special knowledge about Indic scripts, and thus it will not apply the feature properly according the linguistic rules. So, instead of waiting for a "more modern font to come along" (which unlikely to happen any time soon, giving how much the industry have invested into OpenType, and the apparent failure of AAT), just try to reach more people in the Indic community and come out with a clear specification and tests that Hans can implement.

from the ms pages it looks like we should split the chars (sequences) into components and after that the oft features can be applied; as the spec is somewhat fuzzy i'll just wait till precise specs are given Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

5501

Age (days ago)

5504

Last active (days ago)

List overview

Download

14 comments

4 participants

participants (4)

Hans Hagen
Idris Samawi Hamid ادريس سماوي حامد
Khaled Hosny
Michael Saunders