On Tue, May 14, 2013 at 5:59 PM, Theppitak Karoonboonyanan < theppitak@gmail.com> wrote:
On Tue, May 14, 2013 at 9:58 PM, luigi scarso
wrote: On Tue, May 14, 2013 at 4:16 PM, Mojca Miklavec
wrote: I could also ask differently: suppose that a motivated Thai programmer would be willing to work on solving the problem properly. What would be the suggested solution?
You can post also in the context ml, maybe there is some Thai user there
.
I am a Thai developer who works on Thai word segmentation tools and thailatex package. So, you can suggest to me. (Please Cc: me, I'm not in the mailing list.)
I'm totally new to LuaTeX and Lua programming language. But I can learn necessary stuffs to get it done.
With a quick search, I saw "linebreak_filter" callback in LuaTeX reference. Is that relevant to the problem? Or using external filter is already acceptable?
Regards, -- Theppitak Karoonboonyanan http://linux.thai.net/~thep/
I Hope that someone can help here -- luigi
On 5/14/2013 6:07 PM, luigi scarso wrote:
I Hope that someone can help here
as Mojca mentioned thai at bachotex i'll add the patterns as a start given specs, examples and time, adding support for thai to context shouldn't be too hard (assuming that there are users) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Tue, May 14, 2013 at 6:17 PM, Hans Hagen wrote:
On 5/14/2013 6:07 PM, luigi scarso wrote:
I Hope that someone can help here
as Mojca mentioned thai at bachotex i'll add the patterns as a start
given specs, examples and time, adding support for thai to context shouldn't be too hard (assuming that there are users)
But it's not trivial either. There's an opensource project implementing word segmentation: http://linux.thai.net/projects/swath The specification (someone's thesis) can be found here: http://www.cs.cmu.edu/~paisarn/papers/thesis99.pdf The ugly part of pdfTeX approach is that it requires an external text processor to digest an input TeX document and return a copy with word segmentation. Then pdfTeX is run on the resulting file. XeTeX can use ICU library to do the segmentation. In LuaTeX one would have to plug the word segmentation somewhere (but writing that part is slightly non-trivial). Mojca
On 5/15/2013 4:09 PM, Mojca Miklavec wrote:
On Tue, May 14, 2013 at 6:17 PM, Hans Hagen wrote:
On 5/14/2013 6:07 PM, luigi scarso wrote:
I Hope that someone can help here
as Mojca mentioned thai at bachotex i'll add the patterns as a start
given specs, examples and time, adding support for thai to context shouldn't be too hard (assuming that there are users)
But it's not trivial either.
It depends ... we're using a dictionary to determine word boundaries, aren't we? I'm pretty sure that I've done more complex coding.
There's an opensource project implementing word segmentation: http://linux.thai.net/projects/swath The specification (someone's thesis) can be found here: http://www.cs.cmu.edu/~paisarn/papers/thesis99.pdf
Ok, so there are some ttext files there with words.
The ugly part of pdfTeX approach is that it requires an external text processor to digest an input TeX document and return a copy with word segmentation. Then pdfTeX is run on the resulting file. XeTeX can use ICU library to do the segmentation.
In LuaTeX one would have to plug the word segmentation somewhere (but writing that part is slightly non-trivial).
I just did a quick test using those dictionaries (abusing some code that i already had on my machine). Quite doable. It all depends on having the dictionaries available (on the garden or in the distribution). Anyhow, it's not that much font related, just language / script support and we already have that for some languages and adding thai to it doesn't hurt. Of course we'd need some testing. It doesn't make much sense to add features to context that no one would use at some point. But ... Luigi is already teaching himself Thai, so ... Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
participants (3)
-
Hans Hagen
-
luigi scarso
-
Mojca Miklavec