Improved support for Norwegian in ConTeXt
I'm writing this to suggest improvements in ConTeXt's support for the Norwegian languages. ConTeXt already has rudimentary support for Norwegian, but with some problems. Language codes -------------- The main problem is that ConTeXt use the language code 'no' for Norwegian. There actually *is* no written language called 'Norwegian'; Norway has two official written languages, Norwegian Bokmål (ISO 639 language code 'nb') and Norwegian Nynorsk (ISO 639 language code 'nn'). The current definitions for 'no' in ConTeXt is for Norwegian Bokmål. (There is a ISO 639 language code 'no' for Norwegian, but this should usually be used for spoken Norwegian, or perhaps for transcriptions of spoken language.) The language code 'no' should be removed, and be replaced by the two language codes 'nb' and 'nn'. See http://en.wikipedia.org/wiki/Norwegian_language for a (not too good) article on the Norwegian languages. For the record, the language names used in LaTeX/Babel is (unfortunately) 'Norwegian' and 'norsk' for Norwegian Bokmål, and 'nynorsk' for Norwegian Nynorsk, instead of 'bokmal'/'bokmål' and 'nynorsk'. Norwegian Bokmål support was added first, and used up the 'Norwegian' name. Hyphenation ----------- The two written language are quite similar, and the current hyphenation dictionary (nohyphbx) was made to support both. But there are (at least) two words which are put in the hyphenation exceptions for this dictionary because they would have different hyphenation (because of different meaning) in Norwegian Nynorsk and Norwegian Bokmål. These are: attende -- nb: at-ten-de ('eighteenth'), nn: att-en-de ('back') betre -- nb: be-tre ('enter'/'set foot on'), nn: bet-re ('better') Would it be possible to have two different hyphenation dictionaries for 'nb' and 'nn', which would only differ in the hyphenation exceptions used for these two words? Language setup -------------- Here is an improved/correct version of the language setup for Norwegian. The setup for 'no' should be removed. \installlanguage [nn] [spacing=packed, lefthyphenmin=2, righthyphenmin=2, leftsentence=---, rightsentence=---, leftsubsentence=---, rightsubsentence=---, leftquote=\upperleftsinglesixquote, rightquote=\upperrightsingleninequote, leftquotation=\leftguillemot, rightquotation=\rightguillemot, date={day,{.},\ ,month,\ ,year}, state=stop] This is for Norwegian Nynorsk ('nn'), but the same setup is used for Norwegian Bokmål (the values used for 'day' differ, though -- see below). But I am not sure I understand what the four *sentence commands are used for. We usually don't use em-dashes in Norwegian, so the entries look incorrect. If you can explain what the commands are used for, I can supply the correct Norwegian definitions. I also noticed that the Italian definitions use leftspeech, middlespeech and rightspeech commands. What are these used for? Other language-specific settings -------------------------------- Norwegian (Bokmål and Nynorsk) differs typographically from English in several other ways. Here is three of them: We don't (usually) use bullets for the first level of unnumbered lists; we use en-dashes. -- Item 1 -- Item 2 -- Item 3 Bullets are commonly seen in document created by word processors of US origin, and in the documents created by people without proper typographic training, though. It would be nice if ConTeXt could use en-dashes by default for lists in Norwegian text. We don't use full stops in numbered lists. In other words, instead of 1. Item 1 2. Item 2 3. Item 3 we write 1 Item 1 2 Item 2 3 Item 3 The same holds for numbered headings, both in the main text and in the TOC. Would it be possible to support this by default in ConTeXt? We also use the comma in decimal numbers (3,14 instead of 3.14). Norwegian labels ---------------- Here is labels for Norwegian (Bokmål and Nynorsk). The old 'no' labels should be removed. The 'nb' ones are taken from the 'no' ones, but with some corrections. Some comments: We don't usually capitalise the first letter in crossreferences. Where one would in English write See Figure 5.22 ... we would write Se figur 5.22 ... (Bokmål) Sjå figur 5.22 ... (Nynorsk) But we would of course write Figur 5.22 viser ... (Figure 5.22 shows ...) The definitions below use a capital first letter. Will this be a problem? I was also unsure about what the 'lines' label should be. The plural of 'line' ('linje') in Norwegian (both 'nb' and 'nn') is 'linjer', but we do not use the plural when referencing more than one line. Where one would write The discussion on lines 5--13 ... in English, we would write Drøftinga på linje 5--13 ... in Norwegian. In other words, we use the singular instead of the plural. The same holds for the other cross-referencing terms ('Figure', 'Table' &c.). Feel free to change the 'lines' label to 'linje' if this make it work better. \setupheadtext [\s!nb] [\v!content=Innhold] \setupheadtext [\s!nn] [\v!content=Innhald] \setupheadtext [\s!nb] [\v!tables=Tabeller] \setupheadtext [\s!nn] [\v!tables=Tabellar] \setupheadtext [\s!nb] [\v!figures=Figurer] \setupheadtext [\s!nn] [\v!figures=Figurar] \setupheadtext [\s!nb] [\v!graphics=Bilde] \setupheadtext [\s!nn] [\v!graphics=Bilete] \setupheadtext [\s!nb] [\v!intermezzi=Intermesso] \setupheadtext [\s!nn] [\v!intermezzi=Intermesso] \setupheadtext [\s!nb] [\v!index=Register] \setupheadtext [\s!nn] [\v!index=Register] \setupheadtext [\s!nb] [\v!abbreviations=Forkortelser] \setupheadtext [\s!nn] [\v!abbreviations=Forkortingar] \setupheadtext [\s!nb] [\v!logos=Logoer] \setupheadtext [\s!nn] [\v!logos=Logoar] \setupheadtext [\s!nb] [\v!units=Enheter] \setupheadtext [\s!nn] [\v!units=Einingar] \setuplabeltext [\s!nb] [\v!table=Tabell ] \setuplabeltext [\s!nn] [\v!table=Tabell ] \setuplabeltext [\s!nb] [\v!figure=Figur ] \setuplabeltext [\s!nn] [\v!figure=Figur ] \setuplabeltext [\s!nb] [\v!intermezzo=Intermesso ] \setuplabeltext [\s!nn] [\v!intermezzo=Intermesso ] \setuplabeltext [\s!nb] [\v!graphic=Bilde ] \setuplabeltext [\s!nn] [\v!graphic=Bilete ] \setuplabeltext [\s!nb] [\v!chapter=] \setuplabeltext [\s!nn] [\v!chapter=] \setuplabeltext [\s!nb] [\v!section=] \setuplabeltext [\s!nn] [\v!section=] \setuplabeltext [\s!nb] [\v!subsection=] \setuplabeltext [\s!nn] [\v!subsection=] \setuplabeltext [\s!nb] [\v!subsubsection=] \setuplabeltext [\s!nn] [\v!subsubsection=] \setuplabeltext [\s!nb] [\v!subsubsubsection=] \setuplabeltext [\s!nn] [\v!subsubsubsection=] \setuplabeltext [\s!nb] [\v!appendix=] % Tillegg \setuplabeltext [\s!nn] [\v!appendix=] % Tillegg \setuplabeltext [\s!nb] [\v!part=Del] \setuplabeltext [\s!nn] [\v!part=Del] \setuplabeltext [\s!nb] [\v!line=linje ] \setuplabeltext [\s!nn] [\v!line=linje ] \setuplabeltext [\s!nb] [\v!lines=linjer ] \setuplabeltext [\s!nn] [\v!lines=linjer ] \setuplabeltext [\s!nb] [\v!january=januar] \setuplabeltext [\s!nb] [\v!february=februar] \setuplabeltext [\s!nb] [\v!march=mars] \setuplabeltext [\s!nb] [\v!april=april] \setuplabeltext [\s!nb] [\v!may=mai] \setuplabeltext [\s!nb] [\v!june=juni] \setuplabeltext [\s!nb] [\v!july=juli] \setuplabeltext [\s!nb] [\v!august=august] \setuplabeltext [\s!nb] [\v!september=september] \setuplabeltext [\s!nb] [\v!october=oktober] \setuplabeltext [\s!nb] [\v!november=november] \setuplabeltext [\s!nb] [\v!december=desember] \setuplabeltext [\s!nn] [\v!january=januar] \setuplabeltext [\s!nn] [\v!february=februar] \setuplabeltext [\s!nn] [\v!march=mars] \setuplabeltext [\s!nn] [\v!april=april] \setuplabeltext [\s!nn] [\v!may=mai] \setuplabeltext [\s!nn] [\v!june=juni] \setuplabeltext [\s!nn] [\v!july=juli] \setuplabeltext [\s!nn] [\v!august=august] \setuplabeltext [\s!nn] [\v!september=september] \setuplabeltext [\s!nn] [\v!october=oktober] \setuplabeltext [\s!nn] [\v!november=november] \setuplabeltext [\s!nn] [\v!december=desember] \setuplabeltext [\s!nb] [\v!sunday=s\ostroke ndag] \setuplabeltext [\s!nb] [\v!monday=mandag] \setuplabeltext [\s!nb] [\v!tuesday=tirsdag] \setuplabeltext [\s!nb] [\v!wednesday=onsdag] \setuplabeltext [\s!nb] [\v!thursday=torsdag] \setuplabeltext [\s!nb] [\v!friday=fredag] \setuplabeltext [\s!nb] [\v!saturday=l\ostroke rdag] \setuplabeltext [\s!nn] [\v!sunday=sundag] \setuplabeltext [\s!nn] [\v!monday=m\aring ndag] \setuplabeltext [\s!nn] [\v!tuesday=tysdag] \setuplabeltext [\s!nn] [\v!wednesday=onsdag] \setuplabeltext [\s!nn] [\v!thursday=torsdag] \setuplabeltext [\s!nn] [\v!friday=fredag] \setuplabeltext [\s!nn] [\v!saturday=laurdag] \setuplabeltext [\s!nb] [\v!page=side ] \setuplabeltext [\s!nb] [\v!atpage=p\aring\ side ] \setuplabeltext [\s!nb] [\v!hencefore=som vist over] \setuplabeltext [\s!nb] [\v!hereafter=som vist under] \setuplabeltext [\s!nb] [\v!see=se ] \setuplabeltext [\s!nn] [\v!page=side ] \setuplabeltext [\s!nn] [\v!atpage=p\aring\ side ] \setuplabeltext [\s!nn] [\v!hencefore=som vist over] \setuplabeltext [\s!nn] [\v!hereafter=som vist under] \setuplabeltext [\s!nn] [\v!see=sj\aring\ ] \setuplabeltext [\s!nb] [\v!january :\s!mnem=jan.] \setuplabeltext [\s!nb] [\v!february :\s!mnem=feb.] \setuplabeltext [\s!nb] [\v!march :\s!mnem=mars] \setuplabeltext [\s!nb] [\v!april :\s!mnem=april] \setuplabeltext [\s!nb] [\v!may :\s!mnem=mai] \setuplabeltext [\s!nb] [\v!june :\s!mnem=juni] \setuplabeltext [\s!nb] [\v!july :\s!mnem=juli] \setuplabeltext [\s!nb] [\v!august :\s!mnem=aug.] \setuplabeltext [\s!nb] [\v!september:\s!mnem=sep.] \setuplabeltext [\s!nb] [\v!october :\s!mnem=okt.] \setuplabeltext [\s!nb] [\v!november :\s!mnem=nov.] \setuplabeltext [\s!nb] [\v!december :\s!mnem=des.] \setuplabeltext [\s!nn] [\v!january :\s!mnem=jan.] \setuplabeltext [\s!nn] [\v!february :\s!mnem=feb.] \setuplabeltext [\s!nn] [\v!march :\s!mnem=mars] \setuplabeltext [\s!nn] [\v!april :\s!mnem=april] \setuplabeltext [\s!nn] [\v!may :\s!mnem=mai] \setuplabeltext [\s!nn] [\v!june :\s!mnem=juni] \setuplabeltext [\s!nn] [\v!july :\s!mnem=juli] \setuplabeltext [\s!nn] [\v!august :\s!mnem=aug.] \setuplabeltext [\s!nn] [\v!september:\s!mnem=sep.] \setuplabeltext [\s!nn] [\v!october :\s!mnem=okt.] \setuplabeltext [\s!nn] [\v!november :\s!mnem=nov.] \setuplabeltext [\s!nn] [\v!december :\s!mnem=des.] Feel free to contact me with any questions or comments. :) -- Karl Ove Hufthammer E-mail and Jabber: karl@huftis.org
I would suggest you to post some of the questions to the ntg-mailing list, where more Norwegian users can comment on it. When doing some changes, the 100% backward compatibility might need to be sacrificed a bit, bot some changes are worth doing so if others agree and if it's a contribution towards a better quality. (I CC-ed to two users who seem to have contributed or complained a bit ;) On 2/4/07, Karl Ove Hufthammer wrote:
I'm writing this to suggest improvements in ConTeXt's support for the Norwegian languages. ConTeXt already has rudimentary support for Norwegian, but with some problems.
Language codes --------------
The main problem is that ConTeXt use the language code 'no' for Norwegian. There actually *is* no written language called 'Norwegian'; Norway has two official written languages, Norwegian Bokmål (ISO 639 language code 'nb') and Norwegian Nynorsk (ISO 639 language code 'nn'). The current definitions for 'no' in ConTeXt is for Norwegian Bokmål. (There is a ISO 639 language code 'no' for Norwegian, but this should usually be used for spoken Norwegian, or perhaps for transcriptions of spoken language.)
The language code 'no' should be removed, and be replaced by the two language codes 'nb' and 'nn'.
Although I don't know the exact situation, a few remarks: - You should probably also provide the correct definitions for calling the language (so that one can say \mainlanguage[norwegian], but perhaps with what you consider to be the proper language tags). It's currently \installlanguage [norwegian] [\s!no] \installlanguage [norsk] [\s!no] % bonus switch You need to fix the two and perhaps add \installlanguage [???] [\s!nb] \installlanguage [???] [\s!nk] - If you remove [no], older documents might break. I don't know much about the situation and the number of users, but can you say which of the two language variants [no] should default to? Since the current definitions probably point to "nb" (from the first blick) - would it make sense to use "nb" when one says \mainlanguage[no]? Perhaps one can issue a warning when the language "no" is selected (statig something like "language 'no' is deprecated, please use 'nb' for Bokmål or nn for Nynorsk instead") I also asked to replace "si" by "sl" for Slovenian some time ago, but that was when there was no support for Slovenian yet and "si" stands for Singhalese (whatever that is). For Norwegian the situation might be slightly different since "no" still means Norwegian, but I don't know how "offensive"/"ignorant" it sounds to you if that one is used. Removing it probably doesn't affect the rest, so if other Norwegian users agree to remove it completely, it can still be done, but I would suggest you to ask the author of the original translations and the rest of users on the ntg-context mailing list first. Otherwise it can still default to one of the two varians (or to a new one if you provide also the third alternative for the "spoken language").
See http://en.wikipedia.org/wiki/Norwegian_language for a (not too good) article on the Norwegian languages.
For the record, the language names used in LaTeX/Babel is (unfortunately) 'Norwegian' and 'norsk' for Norwegian Bokmål, and 'nynorsk' for Norwegian Nynorsk, instead of 'bokmal'/'bokmål' and 'nynorsk'. Norwegian Bokmål support was added first, and used up the 'Norwegian' name.
Hyphenation -----------
The two written language are quite similar, and the current hyphenation dictionary (nohyphbx) was made to support both. But there are (at least) two words which are put in the hyphenation exceptions for this dictionary because they would have different hyphenation (because of different meaning) in Norwegian Nynorsk and Norwegian Bokmål. These are:
attende -- nb: at-ten-de ('eighteenth'), nn: att-en-de ('back') betre -- nb: be-tre ('enter'/'set foot on'), nn: bet-re ('better')
Would it be possible to have two different hyphenation dictionaries for 'nb' and 'nn', which would only differ in the hyphenation exceptions used for these two words?
This can be done. Hans was complaining about the mess of (naming of) Norwegian hyphenation patterns one month ago anyway, I guess that "he won't mind" adding yet another fix to the scripts ;)
Language setup --------------
Here is an improved/correct version of the language setup for Norwegian. The setup for 'no' should be removed.
\installlanguage [nn] [spacing=packed, lefthyphenmin=2, righthyphenmin=2, leftsentence=---, rightsentence=---, leftsubsentence=---, rightsubsentence=---, leftquote=\upperleftsinglesixquote, rightquote=\upperrightsingleninequote, leftquotation=\leftguillemot, rightquotation=\rightguillemot, date={day,{.},\ ,month,\ ,year}, state=stop]
This is for Norwegian Nynorsk ('nn'), but the same setup is used for Norwegian Bokmål (the values used for 'day' differ, though -- see below).
But I am not sure I understand what the four *sentence commands are used for. We usually don't use em-dashes in Norwegian, so the entries look incorrect. If you can explain what the commands are used for, I can supply the correct Norwegian definitions.
I also noticed that the Italian definitions use leftspeech, middlespeech and rightspeech commands. What are these used for?
Other language-specific settings --------------------------------
Norwegian (Bokmål and Nynorsk) differs typographically from English in several other ways. Here is three of them:
We don't (usually) use bullets for the first level of unnumbered lists; we use en-dashes.
-- Item 1 -- Item 2 -- Item 3
Bullets are commonly seen in document created by word processors of US origin, and in the documents created by people without proper typographic training, though. It would be nice if ConTeXt could use en-dashes by default for lists in Norwegian text.
The default is to use bullet, dash, star, triangle for the four levels if itemization. If you want to change the behaviour in your document only, all you need to do is \definesymbol[1][\endash] but I guess that it could be adapted, so that Norwegian documents will all use endash by default. Similar supoprt has already been implemented for Slovenian (to use different set of characters when itemize uses characters). There are two questions: - do other Norwegian users agree to change the default set? - what should be the order then? (ie: what character should be used for the second level of itemization?)
We don't use full stops in numbered lists. In other words, instead of
1. Item 1 2. Item 2 3. Item 3
we write
1 Item 1 2 Item 2 3 Item 3
That's the matter of \setupitemize[stopper=] I don't know how to set that in a langage-specific way, but it sounds reasonable me to add it.
The same holds for numbered headings, both in the main text and in the TOC.
But sections already start with 1 Section name rather than 1. Section name by default. (Support for the second case might be improved in the future. Or rather: I hope that it will be.)
Would it be possible to support this by default in ConTeXt?
We also use the comma in decimal numbers (3,14 instead of 3.14).
We too. In text this is no problem anyway. Math can be setup in that way, but I doubt that it's set up in any language (although it could be). This means that you should better write $3{,}14$ instead of $3,14$, I don't know about any other consequences, since TeX almost never writes out any calculated floats in the resulting document.
Norwegian labels ----------------
Here is labels for Norwegian (Bokmål and Nynorsk). The old 'no' labels should be removed. The 'nb' ones are taken from the 'no' ones, but with some corrections.
Some comments: We don't usually capitalise the first letter in crossreferences. Where one would in English write
See Figure 5.22 ...
we would write
Se figur 5.22 ... (Bokmål) Sjå figur 5.22 ... (Nynorsk)
But when you crossreference, you only get 5.22, you have to write "figur" manually (you can set up that perhaps, so that you get "figure" attached to the number, but in any case you need to do that manually). "Figur 5.22" will only be used under the actual image. When crossreferencing, we use lowercase too, but under the fugure itself I think that uppercase is OK, at least for our language (since it's caption of the figure anyway).
But we would of course write
Figur 5.22 viser ... (Figure 5.22 shows ...)
The definitions below use a capital first letter. Will this be a problem?
I was also unsure about what the 'lines' label should be. The plural of 'line' ('linje') in Norwegian (both 'nb' and 'nn') is 'linjer', but we do not use the plural when referencing more than one line. Where one would write
The discussion on lines 5--13 ...
in English, we would write
Drøftinga på linje 5--13 ...
in Norwegian. In other words, we use the singular instead of the plural. The same holds for the other cross-referencing terms ('Figure', 'Table' &c.).
Feel free to change the 'lines' label to 'linje' if this make it work better.
I don't know where exactly this is used, but I assume that it's for "List of Figures", "List of Tables". But I don't know exactly, I never use those. (I have just translated some of them and I hoped that the first one who will consider them wrong will complain ;) Mojca
Hi Mojca an impressive summary ... just provide the patches needed, take a look at 'de' and 'deo' ... we can clone languages so definitions can be shared concerning conversions ... there are some language specific things, take a look at chinese (s-chi*)
I would suggest you to post some of the questions to the ntg-mailing list, where more Norwegian users can comment on it. When doing some changes, the 100% backward compatibility might need to be sacrificed a bit, bot some changes are worth doing so if others agree and if it's a contribution towards a better quality. (I CC-ed to two users who seem to have contributed or complained a bit ;)
On 2/4/07, Karl Ove Hufthammer wrote:
I'm writing this to suggest improvements in ConTeXt's support for the Norwegian languages. ConTeXt already has rudimentary support for Norwegian, but with some problems.
Language codes --------------
The main problem is that ConTeXt use the language code 'no' for Norwegian. There actually *is* no written language called 'Norwegian'; Norway has two official written languages, Norwegian Bokm�l (ISO 639 language code 'nb') and Norwegian Nynorsk (ISO 639 language code 'nn'). The current definitions for 'no' in ConTeXt is for Norwegian Bokm�l. (There is a ISO 639 language code 'no' for Norwegian, but this should usually be used for spoken Norwegian, or perhaps for transcriptions of spoken language.)
The language code 'no' should be removed, and be replaced by the two language codes 'nb' and 'nn'.
Although I don't know the exact situation, a few remarks:
- You should probably also provide the correct definitions for calling the language (so that one can say \mainlanguage[norwegian], but perhaps with what you consider to be the proper language tags). It's currently
\installlanguage [norwegian] [\s!no] \installlanguage [norsk] [\s!no] % bonus switch
You need to fix the two and perhaps add \installlanguage [???] [\s!nb] \installlanguage [???] [\s!nk]
- If you remove [no], older documents might break. I don't know much about the situation and the number of users, but can you say which of the two language variants [no] should default to? Since the current definitions probably point to "nb" (from the first blick) - would it make sense to use "nb" when one says \mainlanguage[no]?
Perhaps one can issue a warning when the language "no" is selected (statig something like "language 'no' is deprecated, please use 'nb' for Bokm�l or nn for Nynorsk instead")
I also asked to replace "si" by "sl" for Slovenian some time ago, but that was when there was no support for Slovenian yet and "si" stands for Singhalese (whatever that is).
For Norwegian the situation might be slightly different since "no" still means Norwegian, but I don't know how "offensive"/"ignorant" it sounds to you if that one is used.
Removing it probably doesn't affect the rest, so if other Norwegian users agree to remove it completely, it can still be done, but I would suggest you to ask the author of the original translations and the rest of users on the ntg-context mailing list first. Otherwise it can still default to one of the two varians (or to a new one if you provide also the third alternative for the "spoken language").
See http://en.wikipedia.org/wiki/Norwegian_language for a (not too good) article on the Norwegian languages.
For the record, the language names used in LaTeX/Babel is (unfortunately) 'Norwegian' and 'norsk' for Norwegian Bokm�l, and 'nynorsk' for Norwegian Nynorsk, instead of 'bokmal'/'bokm�l' and 'nynorsk'. Norwegian Bokm�l support was added first, and used up the 'Norwegian' name.
Hyphenation -----------
The two written language are quite similar, and the current hyphenation dictionary (nohyphbx) was made to support both. But there are (at least) two words which are put in the hyphenation exceptions for this dictionary because they would have different hyphenation (because of different meaning) in Norwegian Nynorsk and Norwegian Bokm�l. These are:
attende -- nb: at-ten-de ('eighteenth'), nn: att-en-de ('back') betre -- nb: be-tre ('enter'/'set foot on'), nn: bet-re ('better')
Would it be possible to have two different hyphenation dictionaries for 'nb' and 'nn', which would only differ in the hyphenation exceptions used for these two words?
This can be done. Hans was complaining about the mess of (naming of) Norwegian hyphenation patterns one month ago anyway, I guess that "he won't mind" adding yet another fix to the scripts ;)
Language setup --------------
Here is an improved/correct version of the language setup for Norwegian. The setup for 'no' should be removed.
\installlanguage [nn] [spacing=packed, lefthyphenmin=2, righthyphenmin=2, leftsentence=---, rightsentence=---, leftsubsentence=---, rightsubsentence=---, leftquote=\upperleftsinglesixquote, rightquote=\upperrightsingleninequote, leftquotation=\leftguillemot, rightquotation=\rightguillemot, date={day,{.},\ ,month,\ ,year}, state=stop]
This is for Norwegian Nynorsk ('nn'), but the same setup is used for Norwegian Bokm�l (the values used for 'day' differ, though -- see below).
But I am not sure I understand what the four *sentence commands are used for. We usually don't use em-dashes in Norwegian, so the entries look incorrect. If you can explain what the commands are used for, I can supply the correct Norwegian definitions.
I also noticed that the Italian definitions use leftspeech, middlespeech and rightspeech commands. What are these used for?
Other language-specific settings --------------------------------
Norwegian (Bokm�l and Nynorsk) differs typographically from English in several other ways. Here is three of them:
We don't (usually) use bullets for the first level of unnumbered lists; we use en-dashes.
-- Item 1 -- Item 2 -- Item 3
Bullets are commonly seen in document created by word processors of US origin, and in the documents created by people without proper typographic training, though. It would be nice if ConTeXt could use en-dashes by default for lists in Norwegian text.
The default is to use bullet, dash, star, triangle for the four levels if itemization.
If you want to change the behaviour in your document only, all you need to do is \definesymbol[1][\endash] but I guess that it could be adapted, so that Norwegian documents will all use endash by default.
Similar supoprt has already been implemented for Slovenian (to use different set of characters when itemize uses characters).
There are two questions: - do other Norwegian users agree to change the default set? - what should be the order then? (ie: what character should be used for the second level of itemization?)
We don't use full stops in numbered lists. In other words, instead of
1. Item 1 2. Item 2 3. Item 3
we write
1 Item 1 2 Item 2 3 Item 3
That's the matter of \setupitemize[stopper=]
I don't know how to set that in a langage-specific way, but it sounds reasonable me to add it.
The same holds for numbered headings, both in the main text and in the TOC.
But sections already start with 1 Section name rather than 1. Section name by default. (Support for the second case might be improved in the future. Or rather: I hope that it will be.)
Would it be possible to support this by default in ConTeXt?
We also use the comma in decimal numbers (3,14 instead of 3.14).
We too. In text this is no problem anyway. Math can be setup in that way, but I doubt that it's set up in any language (although it could be). This means that you should better write $3{,}14$ instead of $3,14$, I don't know about any other consequences, since TeX almost never writes out any calculated floats in the resulting document.
Norwegian labels ----------------
Here is labels for Norwegian (Bokm�l and Nynorsk). The old 'no' labels should be removed. The 'nb' ones are taken from the 'no' ones, but with some corrections.
Some comments: We don't usually capitalise the first letter in crossreferences. Where one would in English write
See Figure 5.22 ...
we would write
Se figur 5.22 ... (Bokm�l) Sj� figur 5.22 ... (Nynorsk)
But when you crossreference, you only get 5.22, you have to write "figur" manually (you can set up that perhaps, so that you get "figure" attached to the number, but in any case you need to do that manually).
"Figur 5.22" will only be used under the actual image. When crossreferencing, we use lowercase too, but under the fugure itself I think that uppercase is OK, at least for our language (since it's caption of the figure anyway).
But we would of course write
Figur 5.22 viser ... (Figure 5.22 shows ...)
The definitions below use a capital first letter. Will this be a problem?
I was also unsure about what the 'lines' label should be. The plural of 'line' ('linje') in Norwegian (both 'nb' and 'nn') is 'linjer', but we do not use the plural when referencing more than one line. Where one would write
The discussion on lines 5--13 ...
in English, we would write
Dr�ftinga p� linje 5--13 ...
in Norwegian. In other words, we use the singular instead of the plural. The same holds for the other cross-referencing terms ('Figure', 'Table' &c.).
Feel free to change the 'lines' label to 'linje' if this make it work better.
I don't know where exactly this is used, but I assume that it's for "List of Figures", "List of Tables". But I don't know exactly, I never use those. (I have just translated some of them and I hoped that the first one who will consider them wrong will complain ;)
Mojca _______________________________________________ dev-context mailing list dev-context@ntg.nl http://www.ntg.nl/mailman/listinfo/dev-context
-- ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On 2/4/07, Hans Hagen wrote:
Hi Mojca
an impressive summary ... just provide the patches needed, take a look at 'de' and 'deo' ... we can clone languages so definitions can be shared
For a few things it would be wise to wait for responses first. Also, these are responses by Karl Berry: For the record, the language names used in LaTeX/Babel is (unfortunately) 'Norwegian' and 'norsk' for Norwegian Bokmål, and 'nynorsk' for Norwegian Nynorsk, instead of 'bokmal'/'bokmål' and 'nynorsk'. Naturally, only the Babel maintainers can change Babel. All I can suggest filing a report in the latex bug database. Although quick action is unlikely, at least that way it can be considered. Presumably the bokmal name could become the canonical one, and Norwegian+norsk just left as aliases for compatibility. What I can do in TL is add "bokmal" as an alias for "norsk" and "norwegian", in the hopes that it could prove useful in the future, if Babel changes. I did that. Would it be possible to have two different hyphenation dictionaries for 'nb' and 'nn', which would only differ in the hyphenation exceptions used for these two words? Anything is possible. Maybe a good approach would be to create nbhyph.tex and nnhyph.tex, which \input nohyphbx.tex and then add the necessary exceptions. Then I can make languages nb and nn in TL which \input the new files. Can you upload a new package to CTAN with those files? I'd like to have it on CTAN to start with so that, among other reasons, MiKTeX will also have a chance to benefit. For instance, the package could be installed in something like CTAN:language/hyphenation/norwegian. This is something that the natives have to take care of ;)
concerning conversions ... there are some language specific things, take a look at chinese (s-chi*)
Thanks. Mojca
Måndag 05 februar 2007 01:41 skreiv Mojca Miklavec:
Naturally, only the Babel maintainers can change Babel. All I can suggest filing a report in the latex bug database. Although quick action is unlikely, at least that way it can be considered. Presumably the bokmal name could become the canonical one, and Norwegian+norsk just left as aliases for compatibility.
Yes, that sounds like a good solution.
Would it be possible to have two different hyphenation dictionaries for 'nb' and 'nn', which would only differ in the hyphenation exceptions used for these two words?
Anything is possible. Maybe a good approach would be to create nbhyph.tex and nnhyph.tex, which \input nohyphbx.tex and then add the necessary exceptions. Then I can make languages nb and nn in TL which \input the new files.
Can you upload a new package to CTAN with those files?
Will do. I have contacted the author of the nohyphbx.tex hyphenation patterns to hear if he has any comments (or objections!). If everything works out all right, I will upload the new files to CTAN in the not to distant future. -- Karl Ove Hufthammer E-mail and Jabber: karl@huftis.org
* Mojca Miklavec
I would suggest you to post some of the questions to the ntg-mailing list, where more Norwegian users can comment on it. When doing some changes, the 100% backward compatibility might need to be sacrificed a bit, bot some changes are worth doing so if others agree and if it's a contribution towards a better quality. (I CC-ed to two users who seem to have contributed or complained a bit ;)
On 2/4/07, Karl Ove Hufthammer wrote:
I'm writing this to suggest improvements in ConTeXt's support for the Norwegian languages. ConTeXt already has rudimentary support for Norwegian, but with some problems.
Language codes --------------
The main problem is that ConTeXt use the language code 'no' for Norwegian. There actually *is* no written language called 'Norwegian'; Norway has two official written languages, Norwegian Bokmål (ISO 639 language code 'nb') and Norwegian Nynorsk (ISO 639 language code 'nn'). The current definitions for 'no' in ConTeXt is for Norwegian Bokmål. (There is a ISO 639 language code 'no' for Norwegian, but this should usually be used for spoken Norwegian, or perhaps for transcriptions of spoken language.)
The language code 'no' should be removed, and be replaced by the two language codes 'nb' and 'nn'. [kutt]
I'm not actively using ConTeXt anymore, but I can confirm that there should be no "no" code. We want "nb" and "nn" ;-) I think "no" should be deprecated and giving a (useful) warning/error - telling people to switch to "nb". The rest of Karl Ove's comments/suggestions seems to be correct - he normally knows what he talks about. Best regards, Hans Nordhaug
Sundag 04 februar 2007 18:16 skreiv Mojca Miklavec:
I would suggest you to post some of the questions to the ntg-mailing list, where more Norwegian users can comment on it.
OK. I'm now crossposting this e-mail to both the dev and the ntg mailing list. See my answers to some of your questions below.
On 2/4/07, Karl Ove Hufthammer wrote:
I'm writing this to suggest improvements in ConTeXt's support for the Norwegian languages. ConTeXt already has rudimentary support for Norwegian, but with some problems.
Language codes --------------
The main problem is that ConTeXt use the language code 'no' for Norwegian. There actually *is* no written language called 'Norwegian'; Norway has two official written languages, Norwegian Bokmål (ISO 639 language code 'nb') and Norwegian Nynorsk (ISO 639 language code 'nn'). The current definitions for 'no' in ConTeXt is for Norwegian Bokmål. (There is a ISO 639 language code 'no' for Norwegian, but this should usually be used for spoken Norwegian, or perhaps for transcriptions of spoken language.)
The language code 'no' should be removed, and be replaced by the two language codes 'nb' and 'nn'.
Although I don't know the exact situation, a few remarks:
- You should probably also provide the correct definitions for calling the language (so that one can say \mainlanguage[norwegian], but perhaps with what you consider to be the proper language tags). It's currently
\installlanguage [norwegian] [\s!no] \installlanguage [norsk] [\s!no] % bonus switch
You need to fix the two and perhaps add \installlanguage [???] [\s!nb] \installlanguage [???] [\s!nk]
OK. We will need: \installlanguage [bokmal] [\s!nb] \installlanguage [nynorsk] [\s!nn] If it is possible to use non-ASCII characters safely, the following would also be nice: \installlanguage [bokmål] [\s!nb]
- If you remove [no], older documents might break. I don't know much about the situation and the number of users, but can you say which of the two language variants [no] should default to? Since the current definitions probably point to "nb" (from the first blick) - would it make sense to use "nb" when one says \mainlanguage[no]?
Yes.
Perhaps one can issue a warning when the language "no" is selected (statig something like "language 'no' is deprecated, please use 'nb' for Bokmål or nn for Nynorsk instead")
Yes, that would be the preferred solution. As Hans F. Nordhaug mentioned, the 'no' code should be considered deprecated in this context (no pun intended). To sum up, we need the following language codes: nb and nn. And we need the following mappings: bokmal --> nb bokmål --> nb (if possible) nynorsk --> nn norsk --> nb (with warning) norwegian --> nb (with warning)
Removing it probably doesn't affect the rest, so if other Norwegian users agree to remove it completely, it can still be done, but I would suggest you to ask the author of the original translations and the rest of users on the ntg-context mailing list first. Otherwise it can still default to one of the two varians (or to a new one if you provide also the third alternative for the "spoken language").
See http://en.wikipedia.org/wiki/Norwegian_language for a (not too good) article on the Norwegian languages.
For the record, the language names used in LaTeX/Babel is (unfortunately) 'Norwegian' and 'norsk' for Norwegian Bokmål, and 'nynorsk' for Norwegian Nynorsk, instead of 'bokmal'/'bokmål' and 'nynorsk'. Norwegian Bokmål support was added first, and used up the 'Norwegian' name.
Hyphenation -----------
The two written language are quite similar, and the current hyphenation dictionary (nohyphbx) was made to support both. But there are (at least) two words which are put in the hyphenation exceptions for this dictionary because they would have different hyphenation (because of different meaning) in Norwegian Nynorsk and Norwegian Bokmål. These are:
attende -- nb: at-ten-de ('eighteenth'), nn: att-en-de ('back') betre -- nb: be-tre ('enter'/'set foot on'), nn: bet-re ('better')
Would it be possible to have two different hyphenation dictionaries for 'nb' and 'nn', which would only differ in the hyphenation exceptions used for these two words?
This can be done. Hans was complaining about the mess of (naming of) Norwegian hyphenation patterns one month ago anyway, I guess that "he won't mind" adding yet another fix to the scripts ;)
Language setup --------------
Here is an improved/correct version of the language setup for Norwegian. The setup for 'no' should be removed.
\installlanguage [nn] [spacing=packed, lefthyphenmin=2, righthyphenmin=2, leftsentence=---, rightsentence=---, leftsubsentence=---, rightsubsentence=---, leftquote=\upperleftsinglesixquote, rightquote=\upperrightsingleninequote, leftquotation=\leftguillemot, rightquotation=\rightguillemot, date={day,{.},\ ,month,\ ,year}, state=stop]
This is for Norwegian Nynorsk ('nn'), but the same setup is used for Norwegian Bokmål (the values used for 'day' differ, though -- see below).
But I am not sure I understand what the four *sentence commands are used for. We usually don't use em-dashes in Norwegian, so the entries look incorrect. If you can explain what the commands are used for, I can supply the correct Norwegian definitions.
I also noticed that the Italian definitions use leftspeech, middlespeech and rightspeech commands. What are these used for?
Other language-specific settings --------------------------------
Norwegian (Bokmål and Nynorsk) differs typographically from English in several other ways. Here is three of them:
We don't (usually) use bullets for the first level of unnumbered lists; we use en-dashes.
-- Item 1 -- Item 2 -- Item 3
Bullets are commonly seen in document created by word processors of US origin, and in the documents created by people without proper typographic training, though. It would be nice if ConTeXt could use en-dashes by default for lists in Norwegian text.
The default is to use bullet, dash, star, triangle for the four levels if itemization.
If you want to change the behaviour in your document only, all you need to do is \definesymbol[1][\endash] but I guess that it could be adapted, so that Norwegian documents will all use endash by default.
Similar supoprt has already been implemented for Slovenian (to use different set of characters when itemize uses characters).
There are two questions: - do other Norwegian users agree to change the default set? - what should be the order then? (ie: what character should be used for the second level of itemization?)
My suggestion is \definesymbol[1][{\symbol[dash]}] \definesymbol[2][{\symbol[star]}] \definesymbol[3][{\symbol[circle]}] \definesymbol[4][{\symbol[bullet]}] \definesymbol[5][{\symbol[triangle]}] and leave levels 6+ at their defaults. Norwegian people, feel free to comment on this. :)
We don't use full stops in numbered lists. In other words, instead of
1. Item 1 2. Item 2 3. Item 3
we write
1 Item 1 2 Item 2 3 Item 3
That's the matter of \setupitemize[stopper=]
I don't know how to set that in a langage-specific way, but it sounds reasonable me to add it.
The same holds for numbered headings, both in the main text and in the TOC.
But sections already start with 1 Section name rather than 1. Section name by default. (Support for the second case might be improved in the future. Or rather: I hope that it will be.)
Would it be possible to support this by default in ConTeXt?
We also use the comma in decimal numbers (3,14 instead of 3.14).
We too. In text this is no problem anyway. Math can be setup in that way, but I doubt that it's set up in any language (although it could be). This means that you should better write $3{,}14$ instead of $3,14$,
OK. I guess this is an adequate solution. A problem occurs only when people write $3,14$ without thinking, and don't notice that the result looks really bad. In LaTeX, there is a package ncccomma that defines an 'intelligent' comma to fix this, so that you can use the comma as both a decimal separator and a list separator. (The comma in ncccomma is much more 'intelligent' than the one in 'icomma', BTW.)
I don't know about any other consequences, since TeX almost never writes out any calculated floats in the resulting document.
Norwegian labels ----------------
Here is labels for Norwegian (Bokmål and Nynorsk). The old 'no' labels should be removed. The 'nb' ones are taken from the 'no' ones, but with some corrections.
Some comments: We don't usually capitalise the first letter in crossreferences. Where one would in English write
See Figure 5.22 ...
we would write
Se figur 5.22 ... (Bokmål) Sjå figur 5.22 ... (Nynorsk)
But when you crossreference, you only get 5.22, you have to write "figur" manually (you can set up that perhaps, so that you get "figure" attached to the number, but in any case you need to do that manually).
OK. No problem then. :)
"Figur 5.22" will only be used under the actual image. When crossreferencing, we use lowercase too, but under the fugure itself I think that uppercase is OK, at least for our language (since it's caption of the figure anyway).
But we would of course write
Figur 5.22 viser ... (Figure 5.22 shows ...)
The definitions below use a capital first letter. Will this be a problem?
I was also unsure about what the 'lines' label should be. The plural of 'line' ('linje') in Norwegian (both 'nb' and 'nn') is 'linjer', but we do not use the plural when referencing more than one line. Where one would write
The discussion on lines 5--13 ...
in English, we would write
Drøftinga på linje 5--13 ...
in Norwegian. In other words, we use the singular instead of the plural. The same holds for the other cross-referencing terms ('Figure', 'Table' &c.).
Feel free to change the 'lines' label to 'linje' if this make it work better.
I don't know where exactly this is used, but I assume that it's for "List of Figures", "List of Tables". But I don't know exactly, I never use those. (I have just translated some of them and I hoped that the first one who will consider them wrong will complain ;)
:) -- Karl Ove Hufthammer E-mail and Jabber: karl@huftis.org
participants (4)
-
Hans F. Nordhaug
-
Hans Hagen
-
Karl Ove Hufthammer
-
Mojca Miklavec