Am Thu, 12 Mar 2015 08:48:27 +0100 schrieb Manfred Lotz:
Hi all, If I run this minimal example
\starttext
�
\stopluacode
\stoptext
I get
tex error > error on line 3 in file /data/tmp/u1.tex: ! String contains an invalid utf-8 sequence
and some more lines.
The character above is:
Character: � Character name: REPLACEMENT CHARACTER Charblock: Specials Category: Other symbol Unicode: U+fffd UTF8: 0xefbfbd
which is a valid utf8 character.
Questions:
1. Why is it considered to be invalid?
This is not a context question/problem but related to the binary (you would get the same error with lualatex or plain) The luatex code contains the lines (in unistring.w) if (val == 0xFFFD) utf_error(); return (val); in a function str2uni. I didn't really try to understand the code but it looks as if 0xFFFD is used as "invalid marker": If luatex encounters something that isn't valid utf8 it maps val to 0xFFFD and then test against 0xFFFD to rise an error.
2. Are there other valid utf8 characters which are considered invalid?
The comment in the code says /* the 5- and 6-byte UTF-8 sequences generate integers that are outside of the valid UCS range, and therefore unsupported */ -- Ulrike Fischer http://www.troubleshooting-tex.de/