On 3/12/2015 7:08 PM, Manfred Lotz wrote:
Hi Arthur,
On Thu, 12 Mar 2015 16:35:47 +0000 Arthur Reutenauer
wrote: The luatex code contains the lines (in unistring.w)
if (val == 0xFFFD) utf_error(); return (val);
in a function str2uni. I didn't really try to understand the code but it looks as if 0xFFFD is used as "invalid marker":
Interesting. This is not actually correct, U+FFFD is a valid Unicode character; it would be better to use U+FFFE or U+FFFF for that.
Note that U+FFFD is the recommended character to use when a character can't be recognised while converting to Unicode from another encoding, so its presence is usually a sign that something went wrong upstream, but I assume Manfred is aware of that.
Yes, I'm aware of that. So I also think that it isn't correct to use U+FFFD for this. Your suggestion of using either U+FFFE or U+FFFF sounds good as both are really invalid.
it's an attempt to recover but in the process a normal 0xFFFD triggers an error too; recovering to 0xFFFD for a really invalid input is ok as tex does that in more cases: i expected a } so i insert one here ... cross your fingers etc Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------