[NTG-context] Unicode question

Manfred Lotz manfred.lotz at arcor.de
Thu Mar 12 19:08:06 CET 2015

Hi Arthur,

On Thu, 12 Mar 2015 16:35:47 +0000
Arthur Reutenauer <arthur.reutenauer at normalesup.org> wrote:

> > The luatex code contains the lines (in unistring.w)
> > 
> > if (val == 0xFFFD)
> >         utf_error();
> >     return (val);
> > 
> > in a function str2uni. I didn't really try to understand the code
> > but it looks as if 0xFFFD is used as "invalid marker":
> Interesting.  This is not actually correct, U+FFFD is a valid Unicode
> character; it would be better to use U+FFFE or U+FFFF for that.
> Note that U+FFFD is the recommended character to use when a character
> can't be recognised while converting to Unicode from another
> encoding, so its presence is usually a sign that something went wrong
> upstream, but I assume Manfred is aware of that.

Yes, I'm aware of that. So I also think that it isn't correct to use
U+FFFD for this. Your suggestion of using either U+FFFE or U+FFFF
sounds good as both are really invalid.

Best, Manfred

More information about the ntg-context mailing list