[NTG-context] Malayalam conjuncts

kauśika citturs at gmail.com
Sat Jan 1 05:18:53 CET 2022


On Friday, December 31, 2021 6:22:15 PM IST Ajith R via ntg-context wrote:
> The conjuncts that are not formed
> correctly are those where the second component is ര U+0D30, followed by
> a symbol that is shown on the right side of the conjunct viz ാ U+0D3E,
> ി U+0D3F, ീ U+0D40, ു  U+0D41, ൂ U+0D42. If no symbols follow, or
> symbols follow on the left or on both sides, the conjunct is well
> formed.In the minimum working example given 5 is well formed, while the
> first conjunct in 6 is well formed and second ill formed conjunct.

I have been using ConTeXt to typeset documents in several Indic languages and 
have run into similar issues (in many languages). Please see this for a 
similar issue in some conjuncts for Devanagari:
https://www.mail-archive.com/ntg-context@ntg.nl/msg99691.html

For what its worth, I have not had issues with some fonts while issues with 
others persist. Some of these issues we can work around as I have pointed out 
in the above posting.

In almost all cases I encountered no issues while using Xe(La)Tex. Based on 
some advise from Hans and reading about these OTF features and their 
implementations in Indic fonts, I think these issues might be due to 
differences in implementation. [Not entirely sure since I am a novice]. My 
guess is that Harfbuzz (which is what Xe(La)TeX uses by default) uses some 
heuristics to work out these conjuncts (?!).

To answer your specific question regarding the conjuncts in the given words you 
have to use some Unicode hacking to get what you want in ConTeXt.

In each of the following ZWS refers to the Unicode character (zero-width space 
U+200B)

1. ശ്രീ 
This is typeset correctly by writing  
	ശ്ര + 
	ZWS (U+200B) + 
 	ീ

2. അശ്രു
Typeset correctly with
	അശ്ര + 
	ZWS (U+200B) + 
	​ൂ

3. ശുശ്രൂഷ
Typeset correctly with
	ശുശ്ര +
	ZWS (U+200B) + 
	ൂ +
	ഷ

4. പ്രാസം
Typeset correctly with
	പ്ര + 
	ZWS (U+200B) + 
	ാ +
	സം

5. പ്രേയസി (rendered correctly as entered; no hacks necessary)

6. പ്രോഗ്രാം
Typeset correctly with
	പ്രൊ +
	ഗ്ര + 
	ZWS (U+200B) +
	ാ +‌
	ം
where the last character is the Malayalam Anusvara.

Consider yet another example:
സാന്ദ്രാനന്ദാഅവബൊധാത്മകമ്

Here the 'ന്ദ്രാ' conjunct is not typeset in ConTeXt. To fix this I do 
	ന്ദ്ര +
	ZWS (U+200B) +
	​ാ

This is what I have been doing to ensure correct typesetting of Malayalam and 
other Indic languages in ConTeXt. Honestly, it is inconvenient since the .tex 
files containing Unicode are no longer sanitary. However, ConTeXt has so many 
remarkable features that the very thought of having to go back to (Xe)LaTeX 
(just for harfbuzz rendering) causes me immense pain. As far as I am 
concerned, in every other way ConTeXt simply has no match in the (Xe)LaTeX 
world. In my usage of ConTeXt for my academic work (in English with lots of 
mathematics) I have encountered no issues. Even if I did there was always some 
legitimate (non-hacky) fix for it. For me personally, the rendering in Indic 
languages is the only pain point with ConTeXt (which I am willing to live 
with).

So I am willing to live with the drawbacks till the day they are hopefully 
fixed. Anyway, I hope you can use these fixes temporarily. For example, if your 
editor supports it, you can replace all glyphs with this issue with the 
corresponding recipe involving ZWS.

Dear Hans and other developers of ConTeXt, LuaTeX, 
If you happen to see this please look into the font system (where it concerns 
Indic systems). The present issue is very similar to the one I posted about 
earlier:
https://www.mail-archive.com/ntg-context@ntg.nl/msg99691.html
I have described the issue and the hacks to fix it the best I can. In case 
there is any other information that I can provide please let me know.

Best,
kauśika




More information about the ntg-context mailing list