Greek: GR or EL? Czech: CZ or CS? UK: Ukrainian or ...
Hello, I have noticed that ConTeXt uses "gr" for Greek, but the ISO code seems to be "el". Less problematic: should agr be grc instead? (OpenType uses PGR, but I'm not sure if that's the same thing.) What do the Greek experts say? Well, English is a story on its own. "us" and "uk" don't have their own codes as a separate language, and even worse: uk should stand for Ukrainian!!! "Norwegian" (which is not a language at all) should be patched (according to an old user request) once. A similar problem exists with: - Chinese (cn instead of zn) - Czech (cz instead of cs) - Vietnamese (vn instead of vi) - Ukrainian (ua instead of uk!!!) A case where I have no opinion: - deo Some languages have already changed their codes in the past: - Spanish: sp -> es - German: du -> de - Slovenian: si -> sl (no trace left, I hope :) My proposal would be to change: - gr -> el - agr -> grc - cz -> cs - vn -> vi - deo -> ? (if at all) gmh - German, Middle High (ca.1050-1500) goh - German, Old High (ca.750-1050) - cn -> zn (with *lots of care*) And to keep all the needed synonyms. (Besides that: to issue a warning if possible.) I have no idea what to do with Ukrainian and UK though. ------------- Another issue: some languages need some little modifications or alternatives: 1.) In German, Slovenian, Croatian, (maybe in other languages as well) ... one can use two types of quotes: - „" U+201E/U+201C & ‚' U+201A/U+2018 (sorry, a bug in gmail reencodes them) - »« U+00BB/U+00AB & ›‹ U+203A/U+2039 It might make sense to be able to say something similar to \mainlanguage [german] [quotes | quotationmarks | quotationstyle = guillemots | guillemets or comma | ninesix] 2.) I could imagine a Serbian user to request being able to typeset in two scritpts (Latin or Cyrillic). That means: - different labels - loading different hyphenation patterns (even though transcription in either direction can be made on the fly - I can confirm that a user has already asked me if I know how to input text in cyrillic and get output in latin - as he wasn't fluent in reading Cyrillic, he wanted to misuse ConTeXt to help him read texts from web) So I could imagine making Cyrillic the default script, but still letting one to use \mainlanguage [serbian] [script=latin, % or even (if any user would be enthusiastic enough to provide code) transliteration=on] and get latin labels and hyphenation patterns. 3.) Solve the problem with English in a more elegant way: \mainlanguage [english] [alternative=us] or \mainlanguage[en][US] % as in "en_US.UTF-8" \mainlanguage[en][GB] \mainlanguage[en][AU] \mainlanguage[de][AT] % if one ever figures out that "German from Germany" isn't good enough Then, [us] should be kept as a synonym for \mainlanguage[en][US]. (The examples above could also be called via \mainlanguage[de][alternative=guillemets] or \mainlanguage[sr][alternative=latin].) 4.) deo \mainlanguage[de][alternative=old] ??? (no idea what that is about) Note that 1.) could be combined (should be "combinable") with this one. Any thoughts? Mojca
On Dec 5, 2007, at 2:40 AM, Mojca Miklavec wrote:
I have noticed that ConTeXt uses "gr" for Greek, but the ISO code seems to be "el". Less problematic: should agr be grc instead? (OpenType uses PGR, but I'm not sure if that's the same thing.)
What do the Greek experts say?
Hi Mojca, I have no strong opinion regarding gr/el, but what would "grc" stand for? Thomas
I have no strong opinion regarding gr/el, but what would "grc" stand for?
It's the ISO-639-2 alpha-3 code for "Greek, Ancient (to 1453)" -- May 29th, I believe ;-) See http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt Arthur
On Dec 5, 2007, at 9:20 AM, Arthur Reutenauer wrote:
It's the ISO-639-2 alpha-3 code for "Greek, Ancient (to 1453)" -- May 29th, I believe ;-)
See http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt
Arthur
Ah, thanks! In that case, yes, let's go for grc. I had no idea ISO was working retroactively as well... Hans, I will look for "agr" in the sources and send you patches, OK? Thomas
Thomas A. Schmitz wrote:
On Dec 5, 2007, at 9:20 AM, Arthur Reutenauer wrote:
It's the ISO-639-2 alpha-3 code for "Greek, Ancient (to 1453)" -- May 29th, I believe ;-)
See http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt
Arthur
Ah, thanks! In that case, yes, let's go for grc. I had no idea ISO was working retroactively as well... Hans, I will look for "agr" in the sources and send you patches, OK?
sure ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
2007/12/5, Mojca Miklavec
Hello,
I have noticed that ConTeXt uses "gr" for Greek, but the ISO code seems to be "el". Less problematic: should agr be grc instead? (OpenType uses PGR, but I'm not sure if that's the same thing.)
What do the Greek experts say?
Well, English is a story on its own. "us" and "uk" don't have their own codes as a separate language, and even worse: uk should stand for Ukrainian!!!
"Norwegian" (which is not a language at all) should be patched (according to an old user request) once.
A similar problem exists with: - Chinese (cn instead of zn) - Czech (cz instead of cs) - Vietnamese (vn instead of vi) - Ukrainian (ua instead of uk!!!)
A case where I have no opinion: - deo
Some languages have already changed their codes in the past: - Spanish: sp -> es - German: du -> de - Slovenian: si -> sl (no trace left, I hope :)
My proposal would be to change: - gr -> el - agr -> grc - cz -> cs - vn -> vi - deo -> ? (if at all) gmh - German, Middle High (ca.1050-1500) goh - German, Old High (ca.750-1050) - cn -> zn (with *lots of care*)
And to keep all the needed synonyms. (Besides that: to issue a warning if possible.)
I have no idea what to do with Ukrainian and UK though.
-------------
Another issue: some languages need some little modifications or alternatives:
1.) In German, Slovenian, Croatian, (maybe in other languages as well) ... one can use two types of quotes: - „" U+201E/U+201C & ‚' U+201A/U+2018 (sorry, a bug in gmail reencodes them) - »« U+00BB/U+00AB & ›‹ U+203A/U+2039
It is also common two write «text ‹text› text» in German.
It might make sense to be able to say something similar to \mainlanguage [german] [quotes | quotationmarks | quotationstyle = guillemots | guillemets or comma | ninesix]
2.) I could imagine a Serbian user to request being able to typeset in two scritpts (Latin or Cyrillic). That means: - different labels - loading different hyphenation patterns (even though transcription in either direction can be made on the fly - I can confirm that a user has already asked me if I know how to input text in cyrillic and get output in latin - as he wasn't fluent in reading Cyrillic, he wanted to misuse ConTeXt to help him read texts from web)
So I could imagine making Cyrillic the default script, but still letting one to use
\mainlanguage [serbian] [script=latin, % or even (if any user would be enthusiastic enough to provide code) transliteration=on]
and get latin labels and hyphenation patterns.
3.) Solve the problem with English in a more elegant way:
\mainlanguage [english] [alternative=us]
or
\mainlanguage[en][US] % as in "en_US.UTF-8" \mainlanguage[en][GB] \mainlanguage[en][AU] \mainlanguage[de][AT] % if one ever figures out that "German from Germany" isn't good enough
Then, [us] should be kept as a synonym for \mainlanguage[en][US].
(The examples above could also be called via \mainlanguage[de][alternative=guillemets] or
As mentioned above this won't work.
\mainlanguage[sr][alternative=latin].)
4.) deo \mainlanguage[de][alternative=old] ??? (no idea what that is about)
The old rules should't be used any longer :-)
Note that 1.) could be combined (should be "combinable") with this one.
Any thoughts?
I think we should keep the current syntax with mkii and allow better control in the mkiv code. Wolfgang
2007/12/5, Mojca Miklavec
: - deo -> ? (if at all) gmh - German, Middle High (ca.1050-1500) goh - German, Old High (ca.750-1050)
No! "deo" ist modern German in old orthography (pre-2005).
... one can use two types of quotes: - „" U+201E/U+201C & ‚' U+201A/U+2018 (sorry, a bug in gmail reencodes
1.) In German, Slovenian, Croatian, (maybe in other languages as well) them)
- »« U+00BB/U+00AB & ›‹ U+203A/U+2039 It is also common two write «text ‹text› text» in German.
That's Swiss (de_CH). In de_DE the guillemets point inwards. (Difference between de_CH and fr: in French they have French spacing, i.e. space between punctuation and word.)
4.) deo \mainlanguage[de][alternative=old] ??? (no idea what that is about)
The old rules should't be used any longer :-)
Don't think too short: You might want to typeset older texts in their typography. Or if you work for FAZ guys. I think we should keep the current syntax with mkii and allow better control
in the mkiv code.
It's a pity that ConTeXt uses non-standard codes at all. But I guess we should keep backwards compatibility. Perhaps we could introduce the ISO codes as ISO.de_DE or the like? Greetlings, Hraban
No! "deo" ist modern German in old orthography (pre-2005).
OK, so I guess that's what RFC 4646 suggests de-1996 for -- I suppose the reform was first introduced in 1996 and adopted only later? See ftp://ftp.rfc-editor.org/in-notes/rfc4646.txt, page 13 (Mojca, the preceding paragraph is for you ;-) Arthur
2007/12/5, Arthur Reutenauer
No! "deo" ist modern German in old orthography (pre-2005).
OK, so I guess that's what RFC 4646 suggests de-1996 for -- I suppose the reform was first introduced in 1996 and adopted only later? See ftp://ftp.rfc-editor.org/in-notes/rfc4646.txt, page 13 (Mojca, the preceding paragraph is for you ;-)
Maybe; we've now the 42th reform of the reform - a lot is back to pre-reform orthography (or both styles are allowed). 2005 was only a estimation - AFAIR in 2005 the reformed orthography became standard for schools and public authorities. Greetlings, Hraban
Mojca Miklavec wrote:
Hello,
I have noticed that ConTeXt uses "gr" for Greek, but the ISO code seems to be "el". Less problematic: should agr be grc instead? (OpenType uses PGR, but I'm not sure if that's the same thing.)
What do the Greek experts say?
etc etc note: this language inventory is part of introducing dynamic feature support in mkiv, which boils down to automatic adaption to languages and such; for this we need relationships between scripts, languages, otf tags etc Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
"Norwegian" (which is not a language at all)
Nobody reacted to that part, so I guess that means no one knowledgeable in Norwegian read it ... I wish to make sure that we did not by no mean intend on insulting Norway or Norwegian-speaking people ;-) We are only trying to sort things out and would like to know which form of the written language the Norwegian translation of the ConTeXt interface uses, namely bokmål ("literary language" a.k.a riksmål a.k.a "Danish Norwegian"), or nynorsk ("new Norwegian" a.k.a. landsmål). I suspect it's bokmål but if someone could confirm that it would be very kind; just look for "\startmessages norwegian" in the ConTeXt sources. If the question does not make sense and the messages could be bokmål as well as nynorsk, that's also an interesting information. I happen to have a small manual of Norwegian but it teaches only bokmål so that does help much (nor does my comprehensive edition of Grieg's songs :-) Arthur
On 12/5/07, Arthur Reutenauer wrote:
"Norwegian" (which is not a language at all)
Nobody reacted to that part, so I guess that means no one knowledgeable in Norwegian read it ... I wish to make sure that we did not by no mean intend on insulting Norway or Norwegian-speaking people ;-) We are only trying to sort things out and would like to know which form of the written language the Norwegian translation of the ConTeXt interface uses, namely bokmål ("literary language" a.k.a riksmål a.k.a "Danish Norwegian"), or nynorsk ("new Norwegian" a.k.a. landsmål).
I suspect it's bokmål but if someone could confirm that it would be very kind; just look for "\startmessages norwegian" in the ConTeXt sources. If the question does not make sense and the messages could be bokmål as well as nynorsk, that's also an interesting information. I happen to have a small manual of Norwegian but it teaches only bokmål so that does help much (nor does my comprehensive edition of Grieg's songs :-)
I have patches written by Karl Ove Hufthammer (on 4th Februar, 2007). See [dev-context] Improved support for Norwegian in ConTeXt Bun we have never met an agreement (never tried too hard) of how it should be implemented, and Karl didn't insist enough, so it has slipped out of focus. I still have a bad conscious about that. Mojca
participants (6)
-
Arthur Reutenauer
-
Hans Hagen
-
Henning Hraban Ramm
-
Mojca Miklavec
-
Thomas A. Schmitz
-
Wolfgang Schuster