[NTG-context] Unicode question

Hans Hagen pragma at wxs.nl
Thu Mar 12 20:04:16 CET 2015


On 3/12/2015 7:08 PM, Manfred Lotz wrote:
> Hi Arthur,
>
> On Thu, 12 Mar 2015 16:35:47 +0000
> Arthur Reutenauer <arthur.reutenauer at normalesup.org> wrote:
>
>>> The luatex code contains the lines (in unistring.w)
>>>
>>> if (val == 0xFFFD)
>>>          utf_error();
>>>      return (val);
>>>
>>> in a function str2uni. I didn't really try to understand the code
>>> but it looks as if 0xFFFD is used as "invalid marker":
>>
>> Interesting.  This is not actually correct, U+FFFD is a valid Unicode
>> character; it would be better to use U+FFFE or U+FFFF for that.
>>
>> Note that U+FFFD is the recommended character to use when a character
>> can't be recognised while converting to Unicode from another
>> encoding, so its presence is usually a sign that something went wrong
>> upstream, but I assume Manfred is aware of that.
>>
>
> Yes, I'm aware of that. So I also think that it isn't correct to use
> U+FFFD for this. Your suggestion of using either U+FFFE or U+FFFF
> sounds good as both are really invalid.


it's an attempt to recover but in the process a normal 0xFFFD triggers 
an error too; recovering to 0xFFFD for a really invalid input is ok as 
tex does that in more cases: i expected a } so i insert one here ... 
cross your fingers etc

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------


More information about the ntg-context mailing list