Hello,
this example stopped working. I don't know exactly when, maybe with the newest Ctx beta. Let's have two sources:
--- t.mkiv
\mainlanguage[cz]
\enableregime[cp1250]
\starttext
\input t2.mkiv
AáSš
\stoptext
---
And:
--- t2.mkiv
AáSš
---
The t2.mkiv is included by the main file t.mkiv.
The strange thing is that chars with diacritics being used in the included file are successfully processed, whilst those in the main file are not.
Here's the process log:
---
MTXrun | run 1: luatex --fmt="c:/ConTeXt/tex/texmf-cache/luatex-cache/context/f53042fa2e1c106bc7e3383ec8c3a00c/formats/cont-en" --lua="c:/ConTeXt/tex/texmf-cache/luatex-cache/context/f53042fa2e1c106bc7e3383ec8c3a00c/formats/cont-en.lui" --backend=pdf "C:/Lukas/Jobs/Holesov.PDPS/SO_201/Texts/T/t.mkiv"This is LuaTeX, Version beta-0.64.0-2010111223 (rev 3956)
\write18 enabled.
(C:/Lukas/Jobs/Holesov.PDPS/SO_201/Texts/T/t.mkiv
ConTeXt ver: 2010.11.27 14:27 MKIV fmt: 2010.11.29 int: english/english
system : cont-new loaded
(c:/ConTeXt/tex/texmf-context/tex/context/base/cont-new.tex
systems : beware: some patches loaded from cont-new.tex
(c:/ConTeXt/tex/texmf-context/tex/context/base/cont-new.mkiv))
system : cont-fil.mkiv loaded
(c:/ConTeXt/tex/texmf-context/tex/context/base/cont-fil.mkiv
loading : ConTeXt File Synonyms
)
system : cont-sys.rme loaded
(c:/ConTeXt/tex/texmf-context/tex/context/user/cont-sys.rme (c:/ConTeXt/tex/texmf-context/tex/context/base/type-def.mkiv) (c:/ConTeXt/tex/texmf-context/tex/context/base/type-lua.mkiv) (c:/ConTeXt/tex/texmf-context/tex/context/base/type-siz.mkiv) (c:/ConTeXt/tex/texmf-context/tex/context/base/type-otf.mkiv))
system : cont-err loaded
(c:/ConTeXt/tex/texmf-context/tex/context/base/cont-err.tex
systems : no file 'cont-sys.tex', using 'cont-sys.rme' instead
)
system : t.top loaded
(t.top
){c:/ConTeXt/tex/texmf/fonts/map/dvips/lm/lm-math.map}{c:/ConTeXt/tex/texmf/fonts/map/dvips/lm/lm-rm.map}{c:/ConTeXt/tex/texmf-context/fonts/map/pdftex/context/mkiv-base.map}
bodyfont : 12pt rm is loaded
fonts : preloading latin modern fonts (first stage)
language : language en is active
publications : loading formatting style from bxml-apa
(c:/ConTeXt/tex/texmf-context/tex/context/base/bxml-apa.mkiv)
systems : begin file C:/Lukas/Jobs/Holesov.PDPS/SO_201/Texts/T/t.mkiv at line 4
(t2.mkiv)
! String contains an invalid utf-8 sequence.
l.7 A
ßSÜ
backends > using xmp file 'c:/ConTeXt/tex/texmf-context/tex/context/base/lpdf-pdx.xml'
pages > flushing realpage 1, userpage 1, subpage 1
systems : end file C:/Lukas/Jobs/Holesov.PDPS/SO_201/Texts/T/t.mkiv at line 8
)
Hello,
On Tue, 21 Dec 2010 10:29:13 +0100, Mojca Miklavec
2010/12/21 Procházka Lukáš Ing. - Pontex s. r. o. wrote:
\mainlanguage[cz]
It should probably be cs, not cz.
I'm getting the same error message even with "cs".
Isn't "cz" country code and "cs" language code?
BTW: What is the difference what the COUNTRY code and the LANGUAGE code affects? Lukas
The "cz" is left for backward compatibility reasons.
Mojca
-- Ing. Lukáš Procházka [mailto:LPr@pontex.cz] Pontex s. r. o. [mailto:pontex@pontex.cz] [http://www.pontex.cz] Bezová 1658 147 14 Praha 4 Tel: +420 244 062 238 Fax: +420 244 461 038
On 21-12-2010 10:37, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
Hello,
On Tue, 21 Dec 2010 10:29:13 +0100, Mojca Miklavec
wrote: 2010/12/21 Procházka Lukáš Ing. - Pontex s. r. o. wrote:
\mainlanguage[cz]
It should probably be cs, not cz.
one can always use 'czech'
I'm getting the same error message even with "cs".
Regimes were dealt with at the file level but it has to be at the line level for in-document switching. I uploaded a beta: - m-database fix - regimes fix - split numbering fix Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
BTW: What is the difference what the COUNTRY code and the LANGUAGE code affects?
In your case, you should use a language code since you want to set a language. ConTeXt has used different sets of codes in the past, but we now try to follow IETF recommendation "Tag for Identifying Languages" (http://tools.ietf.org/html/bcp47), because it's the only one that can be precise enough for our needs (at least among the tagging systems I know of). For example, you can distinguish between British English and American English by appending a country code to the language code (hence "en-gb" and "en-us", respectively). Using a country code to identify a language is generally a bad idea and should be discouraged, since that's not what they're meant for, and it can lead to confusion: for example, you could use the code "uk" to identify English as spoken in the United Kingdom, but that's actually the language code for Ukrainian, which has been an actual problem for ConTeXt in the post (in addition to that, "uk" is not even the proper language code for the United Kingdom: it's "gb", as written above; the reason why "uk" is used as a DNS top-level domain is not clear and has lead, alas, to even more confusion). The authorities that decide upon language and country codes are different committees of the ISO; the ISO standard for language codes is ISO 639 (with different parts), and the one for country codes is ISO 3166 (again, with different parts; the two-letter codes of ISO 3166-1 are generally rather well-known because they're the ones being used for DNS top-level domains -- with some exceptions, see above). Arthur
... OK, thanks all for the answers and fixes.
Lukas
On Tue, 21 Dec 2010 11:11:33 +0100, Arthur Reutenauer
BTW: What is the difference what the COUNTRY code and the LANGUAGE code affects?
In your case, you should use a language code since you want to set a language. ConTeXt has used different sets of codes in the past, but we now try to follow IETF recommendation "Tag for Identifying Languages" (http://tools.ietf.org/html/bcp47), because it's the only one that can be precise enough for our needs (at least among the tagging systems I know of). For example, you can distinguish between British English and American English by appending a country code to the language code (hence "en-gb" and "en-us", respectively). Using a country code to identify a language is generally a bad idea and should be discouraged, since that's not what they're meant for, and it can lead to confusion: for example, you could use the code "uk" to identify English as spoken in the United Kingdom, but that's actually the language code for Ukrainian, which has been an actual problem for ConTeXt in the post (in addition to that, "uk" is not even the proper language code for the United Kingdom: it's "gb", as written above; the reason why "uk" is used as a DNS top-level domain is not clear and has lead, alas, to even more confusion).
The authorities that decide upon language and country codes are different committees of the ISO; the ISO standard for language codes is ISO 639 (with different parts), and the one for country codes is ISO 3166 (again, with different parts; the two-letter codes of ISO 3166-1 are generally rather well-known because they're the ones being used for DNS top-level domains -- with some exceptions, see above).
Arthur
-- Ing. Lukáš Procházka [mailto:LPr@pontex.cz] Pontex s. r. o. [mailto:pontex@pontex.cz] [http://www.pontex.cz] Bezová 1658 147 14 Praha 4 Tel: +420 244 062 238 Fax: +420 244 461 038
On 2010-12-21 <11:11:33>, Arthur Reutenauer wrote:
BTW: What is the difference what the COUNTRY code and the LANGUAGE code affects?
In your case, you should use a language code since you want to set a language. ConTeXt has used different sets of codes in the past, but we now try to follow IETF recommendation "Tag for Identifying Languages" (http://tools.ietf.org/html/bcp47), because it's the only one that can be precise enough for our needs (at least among the tagging systems I know of). For example, you can distinguish between British English and American English by appending a country code to the language code (hence "en-gb" and "en-us", respectively). Using a country code to identify a language is generally a bad idea and should be discouraged, since that's not what they're meant for, and it can lead to confusion: for example, you could use the code "uk" to identify English as spoken in the United Kingdom, but that's actually the language code for Ukrainian, which has been an actual problem for ConTeXt in the post (in addition to that, "uk" is not even the proper language code for the United Kingdom: it's "gb", as written above; the reason why "uk" is used as a DNS top-level domain is not clear and has lead, alas, to even more confusion).
But context uses some non-standard codes also: “deo” instead of “de-1901” for sane German orthography, and “agr” for ancient Attic instead of “grc” or “el-polyton”. (Testfile appended.) Philipp PS: nice tools http://people.w3.org/rishida/utils/subtags/ http://unicode.org/cldr/utility/languageid.jsp
The authorities that decide upon language and country codes are different committees of the ISO; the ISO standard for language codes is ISO 639 (with different parts), and the one for country codes is ISO 3166 (again, with different parts; the two-letter codes of ISO 3166-1 are generally rather well-known because they're the ones being used for DNS top-level domains -- with some exceptions, see above).
Arthur ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________
-- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments
On Tue, 21 Dec 2010 12:19:15 +0100
Philipp Gesang
But context uses some non-standard codes also: deo instead of de-1901 for sane German orthography, and agr for ancient Attic instead of grc or el-polyton. (Testfile appended.)
Yes, we should make grc a synonym of agr. el-polyton is, AFAIK, modern polytonic Greek, which is not the same. Thomas
Yes, we should make grc a synonym of agr. el-polyton is, AFAIK, modern polytonic Greek, which is not the same.
Obviously so. Amusingly, ISO states that "el" is meant for the Greek language after the Fall of Constantinople(*), which assigns a rather long period to Ancient Greek. Note that for all I know, there is no way to distinguish katharevousa from demotic; but it would certainly be added to the IETF registry if someone applied for it. Arthur (*) Of course it doesn't say "after the Fall of Constanople" that way, it says "from 1453 onwards" (http://www.sil.org/iso639-3/documentation.asp?id=ell) but that's clearly what is meant.
On 21-12-2010 1:23, Arthur Reutenauer wrote:
(*) Of course it doesn't say "after the Fall of Constanople" that way, it says "from 1453 onwards" (http://www.sil.org/iso639-3/documentation.asp?id=ell) but that's clearly what is meant.
given some historic disputes on tex mailing list we should be glad that they avoided a reference to macedonian times Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
But context uses some non-standard codes also: “deo” instead of “de-1901” for sane German orthography, and “agr” for ancient Attic instead of “grc” or “el-polyton”. (Testfile appended.)
Yes, that's what I said: ConTeXt has used non-standard codes in the past, now we try to follow BCP 47, while attempting to stay backward-compatible, and providing easy-to-remember names for the user as well. But of course, it's always easier to write a three-line sentence that contains one troll and one severe inaccuracy.
Note that Unicode Language Identifiers are not the same as BCP 47 tags, even if they're very close in their goals and shape. Arthur
On 21-12-2010 12:19, Philipp Gesang wrote:
But context uses some non-standard codes also: “deo” instead of
deo = de old (and no one objected at that point) when context was dutch only we used 'du' as that's the abbreviation used in schools Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
2010/12/21 Procházka Lukáš Ing. - Pontex s. r. o.
It should probably be cs, not cz.
I'm getting the same error message even with "cs".
As I said: my comment was unrelated to your problem.
Isn't "cz" country code and "cs" language code?
BTW: What is the difference what the COUNTRY code and the LANGUAGE code affects?
If you were a citizen of Austria or Switzerlad, you would not be able to use \mainlanguage[at] or \mainlanguage[ch] You need to use [de] instead (which happens to be the same as country code for Germany, but that is just "a coincidence"). Country codes usually have no meaning to ConTeXt. In the case of Czech it is just for the sake of backward-compatibility that cz works. Mojca
If you were a citizen of Austria or Switzerlad, you would not be able to use \mainlanguage[at] or \mainlanguage[ch]
As it happens, "at" is not the ISO 639 code of any language (and it's doubtful it will be in the near future), but "ch" is the code for Chamorro (http://www.ethnologue.com/show_language.asp?code=cha), so you would indeed have problems :-) Arthur
Isn't "cz" country code and "cs" language code?
Indeed -- although it doesn't make much sense that "cs" be the language code for a language name spelt "Czech" in English and "česky" in Czech (I suspect it's related to the fact that "cs" was also the country code for Czechoslovakia in its time, so that the same code was used for the "main" language of that country, so to say). Arthur
participants (6)
-
Arthur Reutenauer
-
Hans Hagen
-
Mojca Miklavec
-
Philipp Gesang
-
Procházka Lukáš Ing. - Pontex s. r. o.
-
Thomas Schmitz