Hi Arthur,
On Thu, 12 Mar 2015 16:35:47 +0000
Arthur Reutenauer
The luatex code contains the lines (in unistring.w)
if (val == 0xFFFD) utf_error(); return (val);
in a function str2uni. I didn't really try to understand the code but it looks as if 0xFFFD is used as "invalid marker":
Interesting. This is not actually correct, U+FFFD is a valid Unicode character; it would be better to use U+FFFE or U+FFFF for that.
Note that U+FFFD is the recommended character to use when a character can't be recognised while converting to Unicode from another encoding, so its presence is usually a sign that something went wrong upstream, but I assume Manfred is aware of that.
Yes, I'm aware of that. So I also think that it isn't correct to use U+FFFD for this. Your suggestion of using either U+FFFE or U+FFFF sounds good as both are really invalid. -- Best, Manfred