Hi Mojca an impressive summary ... just provide the patches needed, take a look at 'de' and 'deo' ... we can clone languages so definitions can be shared concerning conversions ... there are some language specific things, take a look at chinese (s-chi*)
I would suggest you to post some of the questions to the ntg-mailing list, where more Norwegian users can comment on it. When doing some changes, the 100% backward compatibility might need to be sacrificed a bit, bot some changes are worth doing so if others agree and if it's a contribution towards a better quality. (I CC-ed to two users who seem to have contributed or complained a bit ;)
On 2/4/07, Karl Ove Hufthammer wrote:
I'm writing this to suggest improvements in ConTeXt's support for the Norwegian languages. ConTeXt already has rudimentary support for Norwegian, but with some problems.
Language codes --------------
The main problem is that ConTeXt use the language code 'no' for Norwegian. There actually *is* no written language called 'Norwegian'; Norway has two official written languages, Norwegian Bokm�l (ISO 639 language code 'nb') and Norwegian Nynorsk (ISO 639 language code 'nn'). The current definitions for 'no' in ConTeXt is for Norwegian Bokm�l. (There is a ISO 639 language code 'no' for Norwegian, but this should usually be used for spoken Norwegian, or perhaps for transcriptions of spoken language.)
The language code 'no' should be removed, and be replaced by the two language codes 'nb' and 'nn'.
Although I don't know the exact situation, a few remarks:
- You should probably also provide the correct definitions for calling the language (so that one can say \mainlanguage[norwegian], but perhaps with what you consider to be the proper language tags). It's currently
\installlanguage [norwegian] [\s!no] \installlanguage [norsk] [\s!no] % bonus switch
You need to fix the two and perhaps add \installlanguage [???] [\s!nb] \installlanguage [???] [\s!nk]
- If you remove [no], older documents might break. I don't know much about the situation and the number of users, but can you say which of the two language variants [no] should default to? Since the current definitions probably point to "nb" (from the first blick) - would it make sense to use "nb" when one says \mainlanguage[no]?
Perhaps one can issue a warning when the language "no" is selected (statig something like "language 'no' is deprecated, please use 'nb' for Bokm�l or nn for Nynorsk instead")
I also asked to replace "si" by "sl" for Slovenian some time ago, but that was when there was no support for Slovenian yet and "si" stands for Singhalese (whatever that is).
For Norwegian the situation might be slightly different since "no" still means Norwegian, but I don't know how "offensive"/"ignorant" it sounds to you if that one is used.
Removing it probably doesn't affect the rest, so if other Norwegian users agree to remove it completely, it can still be done, but I would suggest you to ask the author of the original translations and the rest of users on the ntg-context mailing list first. Otherwise it can still default to one of the two varians (or to a new one if you provide also the third alternative for the "spoken language").
See http://en.wikipedia.org/wiki/Norwegian_language for a (not too good) article on the Norwegian languages.
For the record, the language names used in LaTeX/Babel is (unfortunately) 'Norwegian' and 'norsk' for Norwegian Bokm�l, and 'nynorsk' for Norwegian Nynorsk, instead of 'bokmal'/'bokm�l' and 'nynorsk'. Norwegian Bokm�l support was added first, and used up the 'Norwegian' name.
Hyphenation -----------
The two written language are quite similar, and the current hyphenation dictionary (nohyphbx) was made to support both. But there are (at least) two words which are put in the hyphenation exceptions for this dictionary because they would have different hyphenation (because of different meaning) in Norwegian Nynorsk and Norwegian Bokm�l. These are:
attende -- nb: at-ten-de ('eighteenth'), nn: att-en-de ('back') betre -- nb: be-tre ('enter'/'set foot on'), nn: bet-re ('better')
Would it be possible to have two different hyphenation dictionaries for 'nb' and 'nn', which would only differ in the hyphenation exceptions used for these two words?
This can be done. Hans was complaining about the mess of (naming of) Norwegian hyphenation patterns one month ago anyway, I guess that "he won't mind" adding yet another fix to the scripts ;)
Language setup --------------
Here is an improved/correct version of the language setup for Norwegian. The setup for 'no' should be removed.
\installlanguage [nn] [spacing=packed, lefthyphenmin=2, righthyphenmin=2, leftsentence=---, rightsentence=---, leftsubsentence=---, rightsubsentence=---, leftquote=\upperleftsinglesixquote, rightquote=\upperrightsingleninequote, leftquotation=\leftguillemot, rightquotation=\rightguillemot, date={day,{.},\ ,month,\ ,year}, state=stop]
This is for Norwegian Nynorsk ('nn'), but the same setup is used for Norwegian Bokm�l (the values used for 'day' differ, though -- see below).
But I am not sure I understand what the four *sentence commands are used for. We usually don't use em-dashes in Norwegian, so the entries look incorrect. If you can explain what the commands are used for, I can supply the correct Norwegian definitions.
I also noticed that the Italian definitions use leftspeech, middlespeech and rightspeech commands. What are these used for?
Other language-specific settings --------------------------------
Norwegian (Bokm�l and Nynorsk) differs typographically from English in several other ways. Here is three of them:
We don't (usually) use bullets for the first level of unnumbered lists; we use en-dashes.
-- Item 1 -- Item 2 -- Item 3
Bullets are commonly seen in document created by word processors of US origin, and in the documents created by people without proper typographic training, though. It would be nice if ConTeXt could use en-dashes by default for lists in Norwegian text.
The default is to use bullet, dash, star, triangle for the four levels if itemization.
If you want to change the behaviour in your document only, all you need to do is \definesymbol[1][\endash] but I guess that it could be adapted, so that Norwegian documents will all use endash by default.
Similar supoprt has already been implemented for Slovenian (to use different set of characters when itemize uses characters).
There are two questions: - do other Norwegian users agree to change the default set? - what should be the order then? (ie: what character should be used for the second level of itemization?)
We don't use full stops in numbered lists. In other words, instead of
1. Item 1 2. Item 2 3. Item 3
we write
1 Item 1 2 Item 2 3 Item 3
That's the matter of \setupitemize[stopper=]
I don't know how to set that in a langage-specific way, but it sounds reasonable me to add it.
The same holds for numbered headings, both in the main text and in the TOC.
But sections already start with 1 Section name rather than 1. Section name by default. (Support for the second case might be improved in the future. Or rather: I hope that it will be.)
Would it be possible to support this by default in ConTeXt?
We also use the comma in decimal numbers (3,14 instead of 3.14).
We too. In text this is no problem anyway. Math can be setup in that way, but I doubt that it's set up in any language (although it could be). This means that you should better write $3{,}14$ instead of $3,14$, I don't know about any other consequences, since TeX almost never writes out any calculated floats in the resulting document.
Norwegian labels ----------------
Here is labels for Norwegian (Bokm�l and Nynorsk). The old 'no' labels should be removed. The 'nb' ones are taken from the 'no' ones, but with some corrections.
Some comments: We don't usually capitalise the first letter in crossreferences. Where one would in English write
See Figure 5.22 ...
we would write
Se figur 5.22 ... (Bokm�l) Sj� figur 5.22 ... (Nynorsk)
But when you crossreference, you only get 5.22, you have to write "figur" manually (you can set up that perhaps, so that you get "figure" attached to the number, but in any case you need to do that manually).
"Figur 5.22" will only be used under the actual image. When crossreferencing, we use lowercase too, but under the fugure itself I think that uppercase is OK, at least for our language (since it's caption of the figure anyway).
But we would of course write
Figur 5.22 viser ... (Figure 5.22 shows ...)
The definitions below use a capital first letter. Will this be a problem?
I was also unsure about what the 'lines' label should be. The plural of 'line' ('linje') in Norwegian (both 'nb' and 'nn') is 'linjer', but we do not use the plural when referencing more than one line. Where one would write
The discussion on lines 5--13 ...
in English, we would write
Dr�ftinga p� linje 5--13 ...
in Norwegian. In other words, we use the singular instead of the plural. The same holds for the other cross-referencing terms ('Figure', 'Table' &c.).
Feel free to change the 'lines' label to 'linje' if this make it work better.
I don't know where exactly this is used, but I assume that it's for "List of Figures", "List of Tables". But I don't know exactly, I never use those. (I have just translated some of them and I hoped that the first one who will consider them wrong will complain ;)
Mojca _______________________________________________ dev-context mailing list dev-context@ntg.nl http://www.ntg.nl/mailman/listinfo/dev-context
-- ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------