undefined control sequence bug with German umlaut in bibliography

Julian Becker

3 Jun 2011 3 Jun '11

8:38 p.m.

I came across an issue in context (context ver. 2011.05.18 22:26, LuaTeX ver: beta-0.65.0-2010121421 (rev 4034) ) when trying to cite a bibliography item having an author with a German umlaut "ä" Compiling the short example below, produces the following output and then ------------------------------------- ... system > begin file test.tex at line 3 publications > loading database from test.bbl (test.bbl ! Undefined control sequence. \@@pbs ->Tr\0 6 \dostartpublication ...@@pby \noexpand \or \@@pbs \noexpand \or \@@pbn \noex... l.9 \stoppublication ) backend > xmp > using file 'M:/Programs/context/tex/texmf-local/tex/context/base/lpdf-pdx.xml' pages > flushing realpage 1, userpage 1, subpage 1 system > end file test.tex at line 5 ... mtx-context | fatal error: return code: 1 ---------------------- The strange thing is that when I change the Tr\"{a}ger to Schr\"{a}ger, everything works just fine. Does anybody know what's going on here?! Julian ---test.tex--------- \setuppublications[alternative=apa,sorttype=bbl] \setupbibtex[database=test.bib] \starttext I'd like to cite \cite[Entry1]. \stoptext -----test.bib:---this one produces an error----- @misc{Entry1, author = {Tr\"{a}ger, D}, title = {{Some Document}}, year = {2006} } ---------------------- ----test.bib--this works--- @misc{Entry1, author = {Schr\"{a}ger, D}, title = {{Some Document}}, year = {2006} } ---------------------- -- Julian Becker Institut für Angewandte Physik, R.123 Westfälische Wilhelms-Universität Münster Corrensstr. 2/4 48149 Münster / Westfalen Tel. 0251 83-3 61 53 Mob. 0151 599 848 29 e-mail: j_beck16@uni-muenster.de "Keep thy heart with all diligence; for it is the wellspring of life."

Attachments:

attachment.html (text/html — 2.4 KB)

Show replies by date

Thomas A. Schmitz

3 Jun 3 Jun

9:03 p.m.

New subject: undefined control sequence bug with German umlaut in bibliography

On Jun 3, 2011, at 8:38 PM, Julian Becker wrote:

...

I came across an issue in context (context ver. 2011.05.18 22:26, LuaTeX ver: beta-0.65.0-2010121421 (rev 4034) ) when trying to cite a bibliography item having an author with a German umlaut "ä"

From btxdoc, which is part of texlive: "you must place the entire accented character in braces; in this case either {\"a} or {\"{a}} will do. Furthermore these braces must not themselves be enclosed in braces (other than the ones that might delimit the entire field or the entire entry); and there must be a backslash as the very first character inside the braces..." Thusly: Tr{\"a}ger But I admit it's not easy to know that, bibtex documentation is a real mess, and Oren Patashnik appears to suffer from a real disease which prevents him from writing clear sentences and easily parsable documents. Thomas

Pontus Lurcock

4 Jun 4 Jun

2:31 a.m.

New subject: undefined control sequence bug with German umlaut in bibliography

On Fri 03 Jun 2011, Thomas A. Schmitz wrote:

...

But I admit it's not easy to know that, bibtex documentation is a real mess

Patience please! ‘This document will be expanded when BibTEX version 1.00 comes out’ -- BIBTEXing, February 8, 1988. :-) Pont

Julian Becker

10:43 a.m.

New subject: undefined control sequence bug with German umlaut in bibliography

Thank you everybody for your answers. Writing Tr{\"a}ger as Thomas suggested works well, but unfortunately, I'm using Mendeley Desktop for the management of my bibtex file and I can't seem to be able to influence the way in which it encodes the special characters. @Mojca: Indeed, it also fails if I just write "Träger" in UTF-8 encoding (however, "Schräger" works just fine. All combinations where the "ä" is at the third place of the word seem to fail.). The error message is similar but slightly different now. The log file shows the following: --------------- system > begin file test.tex at line 3 publications > loading database from test.bbl (test.bbl ! String contains an invalid utf-8 sequence. l.1 \setuppublicationlist[samplesize={Tr Ã06},totalnumber=1] A funny symbol that I can't read has just been (re)read. Just continue, I'll change it to 0xFFFD. ! String contains an invalid utf-8 sequence. l.5 n=1,s=Tr Ã06] A funny symbol that I can't read has just been (re)read. Just continue, I'll change it to 0xFFFD. ! String contains an invalid utf-8 sequence. \doifassignmentelse ...gnmentelse \detokenize {#1} =@@\@end@ \expandafter \se... \dostartpublication ... ->\doifassignmentelse {#1} {\getparameters [\??pb ][k... l.9 \stoppublication A funny symbol that I can't read has just been (re)read. Just continue, I'll change it to 0xFFFD. ) ----------------------- Julian 2011/6/4 Pontus Lurcock

...

On Fri 03 Jun 2011, Thomas A. Schmitz wrote:

...
But I admit it's not easy to know that, bibtex documentation is a real mess

Patience please! ‘This document will be expanded when BibTEX version 1.00 comes out’ -- BIBTEXing, February 8, 1988.

:-)

Pont

___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net

___________________________________________________________________________________

-- Julian Becker Institut für Angewandte Physik, R.123 Westfälische Wilhelms-Universität Münster Corrensstr. 2/4 48149 Münster / Westfalen Tel. 0251 83-3 61 53 Mob. 0151 599 848 29 e-mail: j_beck16@uni-muenster.de "Keep thy heart with all diligence; for it is the wellspring of life."

Julian Becker

11:23 a.m.

New subject: undefined control sequence bug with German umlaut in bibliography

I can also add that in the first case with the author name "Träger", the generated bbl-file looks messed up and (Notepad++ doesn't recognize the encoding as UTF8. Changing the encoding to UTF8 manually shows the complete names "Träger" correctly, but the abbreviations (what should have been "Trä06") seem to be messed up. I'm not familiar with the intricacies and details of UTF8 encoding, but is it possible that there is a byte missing from the "ä" which has been cut off during the abbreviation process? The abbreviated "Trä06" seems to be incorrectly encoded (in hexadecimal) as: 54 C3 A4 30 36, while it should be: 54 72 C3 A4 30 36. So in the abbreviation process, the encoding of some characters over several bytes seems to be neglected. I attached the bbl-files for both cases to this e-mail, since I don't know, what would happen to the encoding, if I just pasted them as plain text here. Julian 2011/6/4 Julian Becker

...

Thank you everybody for your answers. Writing Tr{\"a}ger as Thomas suggested works well, but unfortunately, I'm using Mendeley Desktop for the management of my bibtex file and I can't seem to be able to influence the way in which it encodes the special characters.

@Mojca: Indeed, it also fails if I just write "Träger" in UTF-8 encoding (however, "Schräger" works just fine. All combinations where the "ä" is at the third place of the word seem to fail.). The error message is similar but slightly different now. The log file shows the following: --------------- system > begin file test.tex at line 3 publications > loading database from test.bbl (test.bbl ! String contains an invalid utf-8 sequence. l.1 \setuppublicationlist[samplesize={Tr Ã06},totalnumber=1] A funny symbol that I can't read has just been (re)read. Just continue, I'll change it to 0xFFFD.

! String contains an invalid utf-8 sequence. l.5 n=1,s=Tr Ã06] A funny symbol that I can't read has just been (re)read. Just continue, I'll change it to 0xFFFD.

! String contains an invalid utf-8 sequence. \doifassignmentelse ...gnmentelse \detokenize {#1} =@@\@end@ \expandafter \se... \dostartpublication ... ->\doifassignmentelse {#1} {\getparameters [\??pb ][k... l.9 \stoppublication

A funny symbol that I can't read has just been (re)read. Just continue, I'll change it to 0xFFFD.

) -----------------------

Julian

2011/6/4 Pontus Lurcock

...
On Fri 03 Jun 2011, Thomas A. Schmitz wrote:

...
But I admit it's not easy to know that, bibtex documentation is a real mess

Patience please! ‘This document will be expanded when BibTEX version 1.00 comes out’ -- BIBTEXing, February 8, 1988.

:-)

Pont

___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net

___________________________________________________________________________________

-- Julian Becker Institut für Angewandte Physik, R.123 Westfälische Wilhelms-Universität Münster Corrensstr. 2/4 48149 Münster / Westfalen Tel. 0251 83-3 61 53 Mob. 0151 599 848 29 e-mail: j_beck16@uni-muenster.de

"Keep thy heart with all diligence; for it is the wellspring of life."

Taco Hoekwater

11:26 a.m.

New subject: undefined control sequence bug with German umlaut in bibliography

On 06/04/2011 11:23 AM, Julian Becker wrote:

...

Thank you everybody for your answers. Writing Tr{\"a}ger as Thomas suggested works well, but unfortunately, I'm using Mendeley Desktop for the management of my bibtex file and I can't seem to be able to influence the way in which it encodes the special characters.

Find a different program, then. Bibtex does *not* deal with UTF-8 correctly, period. (complaints to Oren Patashnik please ;)) Best wishes, Taco

Ulrike Fischer

7 Jun 7 Jun

9:42 a.m.

New subject: undefined control sequence bug with German umlaut in bibliography

Am Sat, 04 Jun 2011 11:26:37 +0200 schrieb Taco Hoekwater:

...

...
Thank you everybody for your answers. Writing Tr{\"a}ger as Thomas suggested works well, but unfortunately, I'm using Mendeley Desktop for the management of my bibtex file and I can't seem to be able to influence the way in which it encodes the special characters.

...

Find a different program, then. Bibtex does *not* deal with UTF-8 correctly, period. (complaints to Oren Patashnik please ;))

Doesn't context support biber? Or could Julian save its bib in an 8bit-encoding and context read the bbl as 8bit? -- Ulrike Fischer

Pontus Lurcock

4 Jun 4 Jun

12:07 p.m.

New subject: undefined control sequence bug with German umlaut in bibliography

On Sat 04 Jun 2011, Julian Becker wrote:

...

I'm not familiar with the intricacies and details of UTF8 encoding, but is it possible that there is a byte missing from the "ä" which has been cut off during the abbreviation process?

Well, there *is* more than one way to represent ä in UTF-8, but it's my understanding that anything beyond ASCII is simply not supported by BibTeX. You can get away with it in fields that just get pasted verbatim into the output (usually), but the first three letters of the first author's name are used to construct the key (which is why ‘Schräger’ worked) so there's no way around using the officially sanctioned {\"a} form.

...

...
unfortunately, I'm using Mendeley Desktop for the management of my bibtex file and I can't seem to be able to influence the way in which it encodes the special characters.

This is one reason why I still use plain emacs as a bibliography manager -- sooner or later you need a hack, and that's harder when the raw BibTeX is hidden or generated. In this case you may need to put the hack between Mendeley and BibTeX: pipe the file through ‘sed -e 's/ä/{\\"a}/g'’ or something similar. And/or ask in the Mendeley support forums, since this is a fairly well-known BibTeX ‘feature’ so perhaps someone else has had to deal with it there. Hope this helps, Pont

Julian Becker

12:32 p.m.

New subject: undefined control sequence bug with German umlaut in bibliography

I think I'll go for the piping option then, which seems to be the easiest way out. Thanks for the insights, I didn't actually know much about the interplay of context and bibtex until this little problem occured to me... Julian 2011/6/4 Pontus Lurcock

...

On Sat 04 Jun 2011, Julian Becker wrote:

...
I'm not familiar with the intricacies and details of UTF8 encoding, but is it possible that there is a byte missing from the "ä" which has been cut off during the abbreviation process?

Well, there *is* more than one way to represent ä in UTF-8, but it's my understanding that anything beyond ASCII is simply not supported by BibTeX. You can get away with it in fields that just get pasted verbatim into the output (usually), but the first three letters of the first author's name are used to construct the key (which is why ‘Schräger’ worked) so there's no way around using the officially sanctioned {\"a} form.

...
...
unfortunately, I'm using Mendeley Desktop for the management of my bibtex file and I can't seem to be able to influence the way in which it encodes the special characters.

This is one reason why I still use plain emacs as a bibliography manager -- sooner or later you need a hack, and that's harder when the raw BibTeX is hidden or generated. In this case you may need to put the hack between Mendeley and BibTeX: pipe the file through ‘sed -e 's/ä/{\\"a}/g'’ or something similar. And/or ask in the Mendeley support forums, since this is a fairly well-known BibTeX ‘feature’ so perhaps someone else has had to deal with it there.

Hope this helps,

Pont

___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net

___________________________________________________________________________________

Arthur Reutenauer

6 Jun 6 Jun

12:11 p.m.

New subject: undefined control sequence bug with German umlaut in bibliography

...

Well, there *is* more than one way to represent ä in UTF-8

If you mean "non-shortest" forms such as 0xE0 0x83 0xA4 or 0xF0 0x80 0x83 0xA4, then no, they have been forbidden since Unicode 3 in 2000 (formally Corrigendum #1, see http://www.unicode.org/versions/corrigendum1.html). Arthur

Pontus Lurcock

12:29 p.m.

New subject: undefined control sequence bug with German umlaut in bibliography

On Mon 06 Jun 2011, Arthur Reutenauer wrote:

...

...
Well, there *is* more than one way to represent ä in UTF-8

If you mean "non-shortest" forms such as 0xE0 0x83 0xA4 or 0xF0 0x80 0x83 0xA4, then no, they have been forbidden since Unicode 3 in 2000 (formally Corrigendum #1, see http://www.unicode.org/versions/corrigendum1.html).

I was actually thinking of precomposed vs. combining diacritics. I was blissfully unaware of the non-shortest-form problem up until now... Pont

Arthur Reutenauer

12:44 p.m.

New subject: undefined control sequence bug with German umlaut in bibliography

...

I was actually thinking of precomposed vs. combining diacritics. I was blissfully unaware of the non-shortest-form problem up until now...

Ah, OK. But that's exactly the issue for which canonical equivalence was designed, and in a Unicode-aware version of BibTeX that shouldn't be an issue. However, for now... Arthur

5143

Age (days ago)

5147

Last active (days ago)

List overview

Download

11 comments

6 participants

participants (6)

Arthur Reutenauer
Julian Becker
Pontus Lurcock
Taco Hoekwater
Thomas A. Schmitz
Ulrike Fischer