[NTG-context] Unicode question

Manfred Lotz manfred.lotz at arcor.de
Thu Mar 12 19:08:06 CET 2015


Hi Arthur,

On Thu, 12 Mar 2015 16:35:47 +0000
Arthur Reutenauer <arthur.reutenauer at normalesup.org> wrote:

> > The luatex code contains the lines (in unistring.w)
> > 
> > if (val == 0xFFFD)
> >         utf_error();
> >     return (val);
> > 
> > in a function str2uni. I didn't really try to understand the code
> > but it looks as if 0xFFFD is used as "invalid marker":
> 
> Interesting.  This is not actually correct, U+FFFD is a valid Unicode
> character; it would be better to use U+FFFE or U+FFFF for that.
> 
> Note that U+FFFD is the recommended character to use when a character
> can't be recognised while converting to Unicode from another
> encoding, so its presence is usually a sign that something went wrong
> upstream, but I assume Manfred is aware of that.
> 

Yes, I'm aware of that. So I also think that it isn't correct to use
U+FFFD for this. Your suggestion of using either U+FFFE or U+FFFF
sounds good as both are really invalid.


-- 
Best, Manfred





More information about the ntg-context mailing list