ConTeXt hyphenation patterns for Indic languages
Hello, I have tried to generate the hyphenation patterns for Malayalam language from the TeX hyphenation patterns, but the unicode characters were skipped. On TeXLive 2013, ConTeXt version 2013.04.20, this is what was tried: 1. Uncommented line for "ml" in mtx-patterns.lua 2. Ran "mtxrun --script patterns --convert --path=/usr/share/texlive/texmf-dist/tex/generic/hyph-utf8/patterns/txt/ --destination=/usr/share/texlive/texmf-dist/tex/context/patterns/" 3. Output shows all patterns are removed with similar messages to: "mtx-patterns | removing line with suspected utf character അ (0x0D05), category lo: 1അ1" What is the proper way to add hyphenation patterns for a new language? In addition, how to add a 'new language' to the ConTeXt base (such that \language[ml] can be used? I see "base/lang-ind.mkii" where I could add "\setupheadtext" etc for the language, what is the equivalent for mkiv? Thanks for any answers or pointers. -- Regards, Rajeesh
On 12/14/2013 1:06 PM, Rajeesh K Nambiar wrote:
Hello,
I have tried to generate the hyphenation patterns for Malayalam language from the TeX hyphenation patterns, but the unicode characters were skipped. On TeXLive 2013, ConTeXt version 2013.04.20, this is what was tried:
1. Uncommented line for "ml" in mtx-patterns.lua 2. Ran "mtxrun --script patterns --convert --path=/usr/share/texlive/texmf-dist/tex/generic/hyph-utf8/patterns/txt/ --destination=/usr/share/texlive/texmf-dist/tex/context/patterns/" 3. Output shows all patterns are removed with similar messages to: "mtx-patterns | removing line with suspected utf character അ (0x0D05), category lo: 1അ1"
What is the proper way to add hyphenation patterns for a new language?
In addition, how to add a 'new language' to the ConTeXt base (such that \language[ml] can be used? I see "base/lang-ind.mkii" where I could add "\setupheadtext" etc for the language, what is the equivalent for mkiv?
we need an entry in lang-def.mkiv i uploaded a beta that has the patterns and language definitions .. but up to you to check it and provide better settings if needed Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Sat, Dec 14, 2013 at 1:47 PM, Hans Hagen
On 12/14/2013 1:06 PM, Rajeesh K Nambiar wrote:
Hello,
I have tried to generate the hyphenation patterns for Malayalam language from the TeX hyphenation patterns, but the unicode characters were skipped. On TeXLive 2013, ConTeXt version 2013.04.20, this is what was tried:
1. Uncommented line for "ml" in mtx-patterns.lua 2. Ran "mtxrun --script patterns --convert --path=/usr/share/texlive/texmf-dist/tex/generic/hyph-utf8/patterns/txt/ --destination=/usr/share/texlive/texmf-dist/tex/context/patterns/" 3. Output shows all patterns are removed with similar messages to: "mtx-patterns | removing line with suspected utf character അ (0x0D05), category lo: 1അ1"
What is the proper way to add hyphenation patterns for a new language?
In addition, how to add a 'new language' to the ConTeXt base (such that \language[ml] can be used? I see "base/lang-ind.mkii" where I could add "\setupheadtext" etc for the language, what is the equivalent for mkiv?
we need an entry in lang-def.mkiv
I was fiddling with lang-def.mkiv but "undefined control sequence" error was stopping me, will try to figure out the error.
i uploaded a beta that has the patterns and language definitions .. but up to you to check it and provide better settings if needed
Okay, is the best way to install beta to use the "Standalone ConTeXt" mechanism? Is it advisable to uninstall ConTeXt from TeXLive if so? (wiki says Context Standalone for Linux 32 bit is complied with glibc-2.3.6 while I have glibc-2.17).
Hans
On 12/14/2013 2:17 PM, Rajeesh K Nambiar wrote:
On Sat, Dec 14, 2013 at 1:47 PM, Hans Hagen
wrote: On 12/14/2013 1:06 PM, Rajeesh K Nambiar wrote:
Hello,
I have tried to generate the hyphenation patterns for Malayalam language from the TeX hyphenation patterns, but the unicode characters were skipped. On TeXLive 2013, ConTeXt version 2013.04.20, this is what was tried:
1. Uncommented line for "ml" in mtx-patterns.lua 2. Ran "mtxrun --script patterns --convert --path=/usr/share/texlive/texmf-dist/tex/generic/hyph-utf8/patterns/txt/ --destination=/usr/share/texlive/texmf-dist/tex/context/patterns/" 3. Output shows all patterns are removed with similar messages to: "mtx-patterns | removing line with suspected utf character അ (0x0D05), category lo: 1അ1"
What is the proper way to add hyphenation patterns for a new language?
In addition, how to add a 'new language' to the ConTeXt base (such that \language[ml] can be used? I see "base/lang-ind.mkii" where I could add "\setupheadtext" etc for the language, what is the equivalent for mkiv?
we need an entry in lang-def.mkiv
I was fiddling with lang-def.mkiv but "undefined control sequence" error was stopping me, will try to figure out the error.
probably because you used \??ml which was not yet defined
i uploaded a beta that has the patterns and language definitions .. but up to you to check it and provide better settings if needed
Okay, is the best way to install beta to use the "Standalone ConTeXt" mechanism? Is it advisable to uninstall ConTeXt from TeXLive if so? (wiki says Context Standalone for Linux 32 bit is complied with glibc-2.3.6 while I have glibc-2.17).
just install it in a different place and it will run independently from texlive Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
i uploaded a beta that has the patterns and language definitions .. but up to you to check it and provide better settings if needed
Thanks, Hans! Now mkiv correctly loads the patterns and hyphenation works (though even with \language[ml], log says "language 'en' is active"). Is there a way to also enable mkii/texexec to load the patterns? I did copy-paste the section from lang-def.mkiv to lang-ind.mkii and also added an entry in lang-def.lua but the log does not show pattern for ml loaded, in addition to mentioning "language ml is undefined".
Okay, is the best way to install beta to use the "Standalone ConTeXt" mechanism? Is it advisable to uninstall ConTeXt from TeXLive if so? (wiki says Context Standalone for Linux 32 bit is complied with glibc-2.3.6 while I have glibc-2.17).
just install it in a different place and it will run independently from texlive
Installed on a separate location and works fine.
Am 18.12.2013 um 23:17 schrieb Rajeesh K Nambiar
i uploaded a beta that has the patterns and language definitions .. but up to you to check it and provide better settings if needed
Thanks, Hans! Now mkiv correctly loads the patterns and hyphenation works (though even with \language[ml], log says "language 'en' is active“).
Use \mainlanguage[ml] to change the language. Wolfgang
On 12/18/2013 11:17 PM, Rajeesh K Nambiar wrote:
i uploaded a beta that has the patterns and language definitions .. but up to you to check it and provide better settings if needed
Thanks, Hans! Now mkiv correctly loads the patterns and hyphenation works (though even with \language[ml], log says "language 'en' is active").
\language is a local command, use \mainlanguage instead
Is there a way to also enable mkii/texexec to load the patterns? I did copy-paste the section from lang-def.mkiv to lang-ind.mkii and
probably some more is needed, but in mkii you then also need tp use utf 8 i guess and have the right 8 bit fonts
also added an entry in lang-def.lua but the log does not show pattern for ml loaded, in addition to mentioning "language ml is undefined".
forget about mkii, using 8 bit tex is a bit of a nightmare for ml i think Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Wed, Dec 18, 2013 at 11:25 PM, Hans Hagen
On 12/18/2013 11:17 PM, Rajeesh K Nambiar wrote:
i uploaded a beta that has the patterns and language definitions .. but up to you to check it and provide better settings if needed
Thanks, Hans! Now mkiv correctly loads the patterns and hyphenation works (though even with \language[ml], log says "language 'en' is active").
\language is a local command, use \mainlanguage instead
Does not seem to have an effect here. MWE: \starttext %\language[ml] \mainlanguage[ml] \definedfont[file:rachana*default] കോണ്ടെക്സ്റ്റില് മലയാളം ടൈപ്പ്സെറ്റ് ചെയ്തത് \stoptext The font, Rachana, could be found here: http://download.savannah.gnu.org/releases/smc/fonts/malayalam-fonts-6.0/Rach...
Is there a way to also enable mkii/texexec to load the patterns? I did copy-paste the section from lang-def.mkiv to lang-ind.mkii and
probably some more is needed, but in mkii you then also need tp use utf 8 i guess and have the right 8 bit fonts
also added an entry in lang-def.lua but the log does not show pattern for ml loaded, in addition to mentioning "language ml is undefined".
forget about mkii, using 8 bit tex is a bit of a nightmare for ml i think
Indeed, but I am using mkii with XeTeX backend (texexec --xetex). I'm trying to make Malayalam shaping work on mkiv (taking Devanagari font-odv.lua as example) but it is quite complex, till then I'm depending on mkii+xetex.
On 12/18/2013 11:35 PM, Rajeesh K Nambiar wrote:
On Wed, Dec 18, 2013 at 11:25 PM, Hans Hagen
wrote: On 12/18/2013 11:17 PM, Rajeesh K Nambiar wrote:
i uploaded a beta that has the patterns and language definitions .. but up to you to check it and provide better settings if needed
Thanks, Hans! Now mkiv correctly loads the patterns and hyphenation works (though even with \language[ml], log says "language 'en' is active").
\language is a local command, use \mainlanguage instead
Does not seem to have an effect here. MWE:
\starttext %\language[ml] \mainlanguage[ml] \definedfont[file:rachana*default] കോണ്ടെക്സ്റ്റില് മലയാളം ടൈപ്പ്സെറ്റ് ചെയ്തത് \stoptext
The font, Rachana, could be found here: http://download.savannah.gnu.org/releases/smc/fonts/malayalam-fonts-6.0/Rach...
\starttext \mainlanguage[ml] \definedfont[file:rachana*default] \setuplayout[width=4cm] \setupalign[tolerant] \showframe \setuplanguage[ml][lefthyphenmin=1,righthyphenmin=1] കോണ്ടെക്സ്റ്റില് മലയാളം ടൈപ്പ്സെറ്റ് ചെയ്തത് \stoptext i see one hyphen ... how good are the patterns? ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Now mkiv correctly loads the patterns and hyphenation works (though even with \language[ml], log says "language 'en' is active").
\language is a local command, use \mainlanguage instead
Does not seem to have an effect here. MWE:
\starttext %\language[ml] \mainlanguage[ml] \definedfont[file:rachana*default] കോണ്ടെക്സ്റ്റില് മലയാളം ടൈപ്പ്സെറ്റ് ചെയ്തത് \stoptext
The font, Rachana, could be found here:
http://download.savannah.gnu.org/releases/smc/fonts/malayalam-fonts-6.0/Rach...
\starttext
\mainlanguage[ml] \definedfont[file:rachana*default] \setuplayout[width=4cm] \setupalign[tolerant] \showframe
\setuplanguage[ml][lefthyphenmin=1,righthyphenmin=1]
കോണ്ടെക്സ്റ്റില് മലയാളം ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
i see one hyphen ... how good are the patterns?
Sorry for the confusion - hyphenation indeed works correctly (I confirmed it earlier), but I was just noticing that the logfile still mentions "language 'en' is active" which is harmless I guess.
On 12/19/2013 6:08 PM, Rajeesh K Nambiar wrote:
Now mkiv correctly loads the patterns and hyphenation works (though even with \language[ml], log says "language 'en' is active").
\language is a local command, use \mainlanguage instead
Does not seem to have an effect here. MWE:
\starttext %\language[ml] \mainlanguage[ml] \definedfont[file:rachana*default] കോണ്ടെക്സ്റ്റില് മലയാളം ടൈപ്പ്സെറ്റ് ചെയ്തത് \stoptext
The font, Rachana, could be found here:
http://download.savannah.gnu.org/releases/smc/fonts/malayalam-fonts-6.0/Rach...
\starttext
\mainlanguage[ml] \definedfont[file:rachana*default] \setuplayout[width=4cm] \setupalign[tolerant] \showframe
\setuplanguage[ml][lefthyphenmin=1,righthyphenmin=1]
കോണ്ടെക്സ്റ്റില് മലയാളം ടൈപ്പ്സെറ്റ് ചെയ്തത്
\stoptext
i see one hyphen ... how good are the patterns?
Sorry for the confusion - hyphenation indeed works correctly (I confirmed it earlier), but I was just noticing that the logfile still mentions "language 'en' is active" which is harmless I guess.
normally you set the mainlanguage before \starttext ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Am 19.12.2013 um 19:09 schrieb Hans Hagen
Sorry for the confusion - hyphenation indeed works correctly (I confirmed it earlier), but I was just noticing that the logfile still mentions "language 'en' is active" which is harmless I guess.
normally you set the mainlanguage before \starttext
Even then the message reports english because it is produced by the \initializemainlanguage (lang-ini.mkiv) command \unexpanded\def\initializemainlanguage {\mainlanguage[\currentlanguage]% \showmessage\m!languages9\currentlanguage} which is stored in the \everyjob (core-def.mkiv) register. A better place for the message would be in the definition of the \mainlanguage command. Wolfgang
participants (3)
-
Hans Hagen
-
Rajeesh K Nambiar
-
Wolfgang Schuster