# [NTG-context] Unicode question

Hans Hagen pragma at wxs.nl
Thu Mar 12 21:52:59 CET 2015

```On 3/12/2015 9:41 PM, luigi scarso wrote:
>
>
> On Thu, Mar 12, 2015 at 7:55 PM, Hans Hagen <pragma at wxs.nl
> <mailto:pragma at wxs.nl>> wrote:
>
>     it's actually a bug ... it is ok to map an invalid character in the
>     input to 0xFFFD, halt and continue when permitted, but the method
>     used in luatex thereby obscures a valid 0xFFFD in the input
>
>   FFFD  REPLACEMENT CHARACTER
> • used to replace an incoming character whose
> value is unknown or unrepresentable in
> Unicode

the question is not what to do when an invalid character comes in, in
that case luatex can replace it by 0xFFFD and issue a error as now,

but when the input hasn't an 0xFFFD then luatex should just carry on as
0xFFFD is a *valid* character

it is quite easy for a macro package to trigger an error as

\catcode"FFFD=15

will do thatm but it's impossible for a macro package to intercept the
weird interception by luatex's input handler

> The meaning of FFFD is not "typeset a question mark on a black box" as in �
> (which depends to font in anycase so in principle it's possible to see
> something completely different in a new version of the font)
> but to signal  something potentially wrong with a symbol that currently
> in most cases is �.
> Misusing the meaning  is not  bad di per se, but in this specific case
> I think luatex is correct to be conservative and ask to the user what to do;
> context --batchmode
> typesets the document,
> writes the messages on the log,
> and ends with -1 , so an automatic agent is also alerted.

you cannot force a user to use \batchmode and -1 would abort a wrapper
thereby leading to an invalid document; it means that luatex can never
typeset a document where char 0xFFFD is being typeset and luatex should
not be normative

not accepting 0xFFFD in the input is a bug

Hans

-----------------------------------------------------------------