[NTG-context] Unicode question
manfred.lotz at arcor.de
Thu Mar 12 19:08:06 CET 2015
On Thu, 12 Mar 2015 16:35:47 +0000
Arthur Reutenauer <arthur.reutenauer at normalesup.org> wrote:
> > The luatex code contains the lines (in unistring.w)
> > if (val == 0xFFFD)
> > utf_error();
> > return (val);
> > in a function str2uni. I didn't really try to understand the code
> > but it looks as if 0xFFFD is used as "invalid marker":
> Interesting. This is not actually correct, U+FFFD is a valid Unicode
> character; it would be better to use U+FFFE or U+FFFF for that.
> Note that U+FFFD is the recommended character to use when a character
> can't be recognised while converting to Unicode from another
> encoding, so its presence is usually a sign that something went wrong
> upstream, but I assume Manfred is aware of that.
Yes, I'm aware of that. So I also think that it isn't correct to use
U+FFFD for this. Your suggestion of using either U+FFFE or U+FFFF
sounds good as both are really invalid.
More information about the ntg-context