On Wed, May 27, 2015 at 12:21 PM, luigi scarso <luigi.scarso@gmail.com> wrote:

On Wed, May 20, 2015 at 4:15 PM, David Kastrup <dak@gnu.org> wrote:
Hans Hagen <pragma@wxs.nl> writes:

> (Concerning parsing logs: as the cnf is under user control you cannot
> assume that the log lines are the same always, as some users can set
> them different; i always did. So log file parsers should be flexible
> in this respect.)

Standard TeX is the most fun in that respect. It wraps after 79 bytes,
never mind whether you are in the middle of a UTF-8 character or not.

That's sort of ugly to process with a UTF-8-aware system.

infact I see different output in pdftex luatex and xetex:

Hello\message{%
1xxxxxxxxxx%
2xxxxxxxxx%
3xxxxxxxxx%
4xxxxxxxxx%
5xxxxxxxxx%
6xxxxxxxxx%
7xxxxxxxxx%
8xxxxxx鹿xx%
9xxxxxxxxx%
10xxxxxxxx%
11xxxxxxxx%
12xxxxxxxx%
13xxxxxxxx%
14xxxxxxxx%
15xxxxxxxx%
16xxxxxxxx%
}
\bye

xetex and luatex correctly display 鹿 but luatex has this off-by-one "bug" that I still have to catch.
--
luigi

Ok, not a bug.

0) xetex and luatex don't break a utf-8 sequence (or better, at least luatex should not break an utf-8 sequence in output);

1) xetex show 79 (ie. max_print_line) unicode chars in a utf-8 encoding, so in this case we have that the line with 鹿 is longer than 79 bytes;

2) luatex always shows at max 79 bytes, so in this case that line is shorter .

So an applications that expect at max 79 bytes is ok with luatex, as also is ok an application that expect a valid utf-8 line that ends with "\n" .

luigi