undefined control sequence bug with German umlaut in bibliography
I came across an issue in context (context ver. 2011.05.18 22:26, LuaTeX ver: beta-0.65.0-2010121421 (rev 4034) ) when trying to cite a bibliography item having an author with a German umlaut "ä" Compiling the short example below, produces the following output and then ------------------------------------- ... system > begin file test.tex at line 3 publications > loading database from test.bbl (test.bbl ! Undefined control sequence. \@@pbs ->Tr\0 6 \dostartpublication ...@@pby \noexpand \or \@@pbs \noexpand \or \@@pbn \noex... l.9 \stoppublication ) backend > xmp > using file 'M:/Programs/context/tex/texmf-local/tex/context/base/lpdf-pdx.xml' pages > flushing realpage 1, userpage 1, subpage 1 system > end file test.tex at line 5 ... mtx-context | fatal error: return code: 1 ---------------------- The strange thing is that when I change the Tr\"{a}ger to Schr\"{a}ger, everything works just fine. Does anybody know what's going on here?! Julian ---test.tex--------- \setuppublications[alternative=apa,sorttype=bbl] \setupbibtex[database=test.bib] \starttext I'd like to cite \cite[Entry1]. \stoptext -----test.bib:---this one produces an error----- @misc{Entry1, author = {Tr\"{a}ger, D}, title = {{Some Document}}, year = {2006} } ---------------------- ----test.bib--this works--- @misc{Entry1, author = {Schr\"{a}ger, D}, title = {{Some Document}}, year = {2006} } ---------------------- -- Julian Becker Institut für Angewandte Physik, R.123 Westfälische Wilhelms-Universität Münster Corrensstr. 2/4 48149 Münster / Westfalen Tel. 0251 83-3 61 53 Mob. 0151 599 848 29 e-mail: j_beck16@uni-muenster.de "Keep thy heart with all diligence; for it is the wellspring of life."
On Jun 3, 2011, at 8:38 PM, Julian Becker wrote:
I came across an issue in context (context ver. 2011.05.18 22:26, LuaTeX ver: beta-0.65.0-2010121421 (rev 4034) ) when trying to cite a bibliography item having an author with a German umlaut "ä"
From btxdoc, which is part of texlive: "you must place the entire accented character in braces; in this case either {\"a} or {\"{a}} will do. Furthermore these braces must not themselves be enclosed in braces (other than the ones that might delimit the entire field or the entire entry); and there must be a backslash as the very first character inside the braces..." Thusly: Tr{\"a}ger But I admit it's not easy to know that, bibtex documentation is a real mess, and Oren Patashnik appears to suffer from a real disease which prevents him from writing clear sentences and easily parsable documents. Thomas
On Fri 03 Jun 2011, Thomas A. Schmitz wrote:
But I admit it's not easy to know that, bibtex documentation is a real mess
Patience please! ‘This document will be expanded when BibTEX version 1.00 comes out’ -- BIBTEXing, February 8, 1988. :-) Pont
Thank you everybody for your answers. Writing Tr{\"a}ger as Thomas
suggested works well, but unfortunately, I'm using Mendeley Desktop for the
management of my bibtex file and I can't seem to be able to influence the
way in which it encodes the special characters.
@Mojca: Indeed, it also fails if I just write "Träger" in UTF-8 encoding
(however, "Schräger" works just fine. All combinations where the "ä" is at
the third place of the word seem to fail.). The error message is similar but
slightly different now. The log file shows the following:
---------------
system > begin file test.tex at line 3
publications > loading database from test.bbl
(test.bbl
! String contains an invalid utf-8 sequence.
l.1 \setuppublicationlist[samplesize={Tr
Ã06},totalnumber=1]
A funny symbol that I can't read has just been (re)read.
Just continue, I'll change it to 0xFFFD.
! String contains an invalid utf-8 sequence.
l.5 n=1,s=Tr
Ã06]
A funny symbol that I can't read has just been (re)read.
Just continue, I'll change it to 0xFFFD.
! String contains an invalid utf-8 sequence.
\doifassignmentelse ...gnmentelse \detokenize {#1}
=@@\@end@ \expandafter \se...
\dostartpublication ... ->\doifassignmentelse {#1}
{\getparameters [\??pb ][k...
l.9 \stoppublication
A funny symbol that I can't read has just been (re)read.
Just continue, I'll change it to 0xFFFD.
)
-----------------------
Julian
2011/6/4 Pontus Lurcock
On Fri 03 Jun 2011, Thomas A. Schmitz wrote:
But I admit it's not easy to know that, bibtex documentation is a real mess
Patience please! ‘This document will be expanded when BibTEX version 1.00 comes out’ -- BIBTEXing, February 8, 1988.
:-)
Pont
___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net
___________________________________________________________________________________
-- Julian Becker Institut für Angewandte Physik, R.123 Westfälische Wilhelms-Universität Münster Corrensstr. 2/4 48149 Münster / Westfalen Tel. 0251 83-3 61 53 Mob. 0151 599 848 29 e-mail: j_beck16@uni-muenster.de "Keep thy heart with all diligence; for it is the wellspring of life."
I can also add that in the first case with the author name "Träger", the
generated bbl-file looks messed up and (Notepad++ doesn't recognize the
encoding as UTF8. Changing the encoding to UTF8 manually shows the complete
names "Träger" correctly, but the abbreviations (what should have been
"Trä06") seem to be messed up.
I'm not familiar with the intricacies and details of UTF8 encoding, but is
it possible that there is a byte missing from the "ä" which has been cut off
during the abbreviation process?
The abbreviated "Trä06" seems to be incorrectly encoded (in hexadecimal) as:
54 C3 A4 30 36,
while it should be: 54 72 C3 A4 30 36.
So in the abbreviation process, the encoding of some characters over several
bytes seems to be neglected.
I attached the bbl-files for both cases to this e-mail, since I don't know,
what would happen to the encoding, if I just pasted them as plain text here.
Julian
2011/6/4 Julian Becker
Thank you everybody for your answers. Writing Tr{\"a}ger as Thomas suggested works well, but unfortunately, I'm using Mendeley Desktop for the management of my bibtex file and I can't seem to be able to influence the way in which it encodes the special characters.
@Mojca: Indeed, it also fails if I just write "Träger" in UTF-8 encoding (however, "Schräger" works just fine. All combinations where the "ä" is at the third place of the word seem to fail.). The error message is similar but slightly different now. The log file shows the following: --------------- system > begin file test.tex at line 3 publications > loading database from test.bbl (test.bbl ! String contains an invalid utf-8 sequence. l.1 \setuppublicationlist[samplesize={Tr Ã06},totalnumber=1] A funny symbol that I can't read has just been (re)read. Just continue, I'll change it to 0xFFFD.
! String contains an invalid utf-8 sequence. l.5 n=1,s=Tr Ã06] A funny symbol that I can't read has just been (re)read. Just continue, I'll change it to 0xFFFD.
! String contains an invalid utf-8 sequence. \doifassignmentelse ...gnmentelse \detokenize {#1} =@@\@end@ \expandafter \se... \dostartpublication ... ->\doifassignmentelse {#1} {\getparameters [\??pb ][k... l.9 \stoppublication
A funny symbol that I can't read has just been (re)read. Just continue, I'll change it to 0xFFFD.
) -----------------------
Julian
2011/6/4 Pontus Lurcock
On Fri 03 Jun 2011, Thomas A. Schmitz wrote:
But I admit it's not easy to know that, bibtex documentation is a real mess
Patience please! ‘This document will be expanded when BibTEX version 1.00 comes out’ -- BIBTEXing, February 8, 1988.
:-)
Pont
___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net
___________________________________________________________________________________
-- Julian Becker Institut für Angewandte Physik, R.123 Westfälische Wilhelms-Universität Münster Corrensstr. 2/4 48149 Münster / Westfalen Tel. 0251 83-3 61 53 Mob. 0151 599 848 29 e-mail: j_beck16@uni-muenster.de
"Keep thy heart with all diligence; for it is the wellspring of life."
-- Julian Becker Institut für Angewandte Physik, R.123 Westfälische Wilhelms-Universität Münster Corrensstr. 2/4 48149 Münster / Westfalen Tel. 0251 83-3 61 53 Mob. 0151 599 848 29 e-mail: j_beck16@uni-muenster.de "Keep thy heart with all diligence; for it is the wellspring of life."
On 06/04/2011 11:23 AM, Julian Becker wrote:
Thank you everybody for your answers. Writing Tr{\"a}ger as Thomas suggested works well, but unfortunately, I'm using Mendeley Desktop for the management of my bibtex file and I can't seem to be able to influence the way in which it encodes the special characters.
Find a different program, then. Bibtex does *not* deal with UTF-8 correctly, period. (complaints to Oren Patashnik please ;)) Best wishes, Taco
Am Sat, 04 Jun 2011 11:26:37 +0200 schrieb Taco Hoekwater:
Thank you everybody for your answers. Writing Tr{\"a}ger as Thomas suggested works well, but unfortunately, I'm using Mendeley Desktop for the management of my bibtex file and I can't seem to be able to influence the way in which it encodes the special characters.
Find a different program, then. Bibtex does *not* deal with UTF-8 correctly, period. (complaints to Oren Patashnik please ;))
Doesn't context support biber? Or could Julian save its bib in an 8bit-encoding and context read the bbl as 8bit? -- Ulrike Fischer
On Sat 04 Jun 2011, Julian Becker wrote:
I'm not familiar with the intricacies and details of UTF8 encoding, but is it possible that there is a byte missing from the "ä" which has been cut off during the abbreviation process?
Well, there *is* more than one way to represent ä in UTF-8, but it's my understanding that anything beyond ASCII is simply not supported by BibTeX. You can get away with it in fields that just get pasted verbatim into the output (usually), but the first three letters of the first author's name are used to construct the key (which is why ‘Schräger’ worked) so there's no way around using the officially sanctioned {\"a} form.
unfortunately, I'm using Mendeley Desktop for the management of my bibtex file and I can't seem to be able to influence the way in which it encodes the special characters.
This is one reason why I still use plain emacs as a bibliography manager -- sooner or later you need a hack, and that's harder when the raw BibTeX is hidden or generated. In this case you may need to put the hack between Mendeley and BibTeX: pipe the file through ‘sed -e 's/ä/{\\"a}/g'’ or something similar. And/or ask in the Mendeley support forums, since this is a fairly well-known BibTeX ‘feature’ so perhaps someone else has had to deal with it there. Hope this helps, Pont
I think I'll go for the piping option then, which seems to be the easiest
way out.
Thanks for the insights, I didn't actually know much about the interplay of
context and bibtex until this little problem occured to me...
Julian
2011/6/4 Pontus Lurcock
On Sat 04 Jun 2011, Julian Becker wrote:
I'm not familiar with the intricacies and details of UTF8 encoding, but is it possible that there is a byte missing from the "ä" which has been cut off during the abbreviation process?
Well, there *is* more than one way to represent ä in UTF-8, but it's my understanding that anything beyond ASCII is simply not supported by BibTeX. You can get away with it in fields that just get pasted verbatim into the output (usually), but the first three letters of the first author's name are used to construct the key (which is why ‘Schräger’ worked) so there's no way around using the officially sanctioned {\"a} form.
unfortunately, I'm using Mendeley Desktop for the management of my bibtex file and I can't seem to be able to influence the way in which it encodes the special characters.
This is one reason why I still use plain emacs as a bibliography manager -- sooner or later you need a hack, and that's harder when the raw BibTeX is hidden or generated. In this case you may need to put the hack between Mendeley and BibTeX: pipe the file through ‘sed -e 's/ä/{\\"a}/g'’ or something similar. And/or ask in the Mendeley support forums, since this is a fairly well-known BibTeX ‘feature’ so perhaps someone else has had to deal with it there.
Hope this helps,
Pont
___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net
___________________________________________________________________________________
-- Julian Becker Institut für Angewandte Physik, R.123 Westfälische Wilhelms-Universität Münster Corrensstr. 2/4 48149 Münster / Westfalen Tel. 0251 83-3 61 53 Mob. 0151 599 848 29 e-mail: j_beck16@uni-muenster.de "Keep thy heart with all diligence; for it is the wellspring of life."
Well, there *is* more than one way to represent ä in UTF-8
If you mean "non-shortest" forms such as 0xE0 0x83 0xA4 or 0xF0 0x80 0x83 0xA4, then no, they have been forbidden since Unicode 3 in 2000 (formally Corrigendum #1, see http://www.unicode.org/versions/corrigendum1.html). Arthur
On Mon 06 Jun 2011, Arthur Reutenauer wrote:
Well, there *is* more than one way to represent ä in UTF-8
If you mean "non-shortest" forms such as 0xE0 0x83 0xA4 or 0xF0 0x80 0x83 0xA4, then no, they have been forbidden since Unicode 3 in 2000 (formally Corrigendum #1, see http://www.unicode.org/versions/corrigendum1.html).
I was actually thinking of precomposed vs. combining diacritics. I was blissfully unaware of the non-shortest-form problem up until now... Pont
I was actually thinking of precomposed vs. combining diacritics. I was blissfully unaware of the non-shortest-form problem up until now...
Ah, OK. But that's exactly the issue for which canonical equivalence was designed, and in a Unicode-aware version of BibTeX that shouldn't be an issue. However, for now... Arthur
participants (6)
-
Arthur Reutenauer
-
Julian Becker
-
Pontus Lurcock
-
Taco Hoekwater
-
Thomas A. Schmitz
-
Ulrike Fischer