Adding built-in support for Serbian language
Hello all, I have recently started using ConTeXt. I've found that the distribution includes a proper (cyrillic) hyphenation file for Serbian language, but a complete language support is still not implemented. Therefore, I've added what I think is required, did some testing by putting changed files in my texmf-local, and the result looks fine. There is only one thing that requires a decision from the development team. Serbian language uses two scripts: cyrillic and latin. Context language codes are using 2 letters for identification. So I'm not sure how to include both scripts. What I'm sending now is a cyrillic script implementation, using the code "sr". It is trivial to generate (completely automatic) latin script version of these changes, once it is decided how to label it. Best regards, Ivan
Dear Ivan, On Fri, 30 Oct 2020 at 11:32, Ivan Pešić wrote:
Hello all, I have recently started using ConTeXt.
Welcome!
I've found that the distribution includes a proper (cyrillic) hyphenation file for Serbian language,
I would say that this needs to be changed/improved. There's no reason why it wouldn't load both scripts at the same time (at least for Unicode engines, which is the only thing that's currently supported anyway). This is what XeTeX loads, for example: https://github.com/hyphenation/tex-hyphen/blob/master/hyph-utf8/tex/generic/... \input hyph-sh-latn.tex \input hyph-sh-cyrl.tex That is: it loads both patterns at the same time. Hans, would you be willing to merge two sets of hyphenation patterns together? Alternatively maybe we could prepare hyph-sh.pat.txt on the hyph-utf8 side? I'm actually not sure why we didn't do that already, but maybe it was because we have two sets of cyrillic patterns and it has never been a clear cut which ones to take. The author of hyph-sh-[latn|cyrl] says that his patterns should work universally for multiple languages (they are relatively old), but they were initially only released for the Latin scripts. Later another author wanted to have support for Cyrillic script and prepared his own patterns (I'm no longer sure whether they were partially based on the other ones) without the Latin alternative. In Xe(La)TeX and Lua(La)TeX we use the "sh" patterns for both, for consistency reasons, among others. (You likely want the same word to be hyphenated in the same way in both scripts).
but a complete language support is still not implemented. Therefore, I've added what I think is required, did some testing by putting changed files in my texmf-local, and the result looks fine.
Awesome, thank you.
There is only one thing that requires a decision from the development team. Serbian language uses two scripts: cyrillic and latin. Context language codes are using 2 letters for identification. So I'm not sure how to include both scripts.
(Unless has plans to transliterate the translations on the fly :) there should be two independent files. One should use the code sr-latn and the other one sr-cyrl. Two letter code simply doesn't work in this situation and we should not even try to support one single script, or even attempt to decide which one should be the default one. Both should be supported equally well.
What I'm sending now is a cyrillic script implementation, using the code "sr".
It is trivial to generate (completely automatic) latin script version of these changes, once it is decided how to label it.
Would you be willing to also prepare the latin one then? The codes should be sorted out by Hans (potentially with some help), but we definitely want to use "sr-latn" and "sr-cyrl". For the longer names there is some more freedom. LaTeX uses "serbianl" and "serbianc", I think, but I believe we can come up with something nicer. Maybe something along the lines of the following? \mainlanguage[serbian][script=latn] or \mainlanguage[serbian-latin] \mainlanguage[serbian-cyrillic] No clue, really. Thank you, Mojca (PS: I would say that adding support for transliteration of the text from one script to the other would be a really nice feature. Then you could type your text for a book once and have it typeset in both versions without any extra effort :)
On 10/30/2020 1:42 PM, Mojca Miklavec wrote:
I would say that this needs to be changed/improved. There's no reason why it wouldn't load both scripts at the same time (at least for Unicode engines, which is the only thing that's currently supported anyway).
i'll look into it once i finished some new stuff (in the middle of fit)
(PS: I would say that adding support for transliteration of the text from one script to the other would be a really nice feature. Then you could type your text for a book once and have it typeset in both versions without any extra effort :) just gimme the specs ... sounds like some nice distraction for a rainy weekend
Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
(PS: I would say that adding support for transliteration of the text from one script to the other would be a really nice feature. Then you could type your text for a book once and have it typeset in both versions without any extra effort :)
There is Philipp Gesang's transliterator package: https://gitlab.com/phgsng/transliterator https://modules.contextgarden.net/cgi-bin/module.cgi/ruid=199735311/action=v... Cheers, Henri
Hi Mojca,
\input hyph-sh-latn.tex \input hyph-sh-cyrl.tex That is: it loads both patterns at the same time.
Hans, would you be willing to merge two sets of hyphenation patterns together? Alternatively maybe we could prepare hyph-sh.pat.txt on the hyph-utf8 side? I'm actually not sure why we didn't do that already, but maybe it was because we have two sets of cyrillic patterns and it has never been a clear cut which ones to take.
I think that a merged file is the most natural approach (isn't it "sr" instesad od "sh"?). I can of course add all kind of code for merging btu at some point I guess a merged file will be used anyway. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
participants (4)
-
Hans Hagen
-
Henri Menke
-
Ivan Pešić
-
Mojca Miklavec