On Wed, May 27, 2015 at 12:21 PM, luigi scarso
On Wed, May 20, 2015 at 4:15 PM, David Kastrup
wrote: Hans Hagen
writes: (Concerning parsing logs: as the cnf is under user control you cannot assume that the log lines are the same always, as some users can set them different; i always did. So log file parsers should be flexible in this respect.)
Standard TeX is the most fun in that respect. It wraps after 79 bytes, never mind whether you are in the middle of a UTF-8 character or not.
That's sort of ugly to process with a UTF-8-aware system.
infact I see different output in pdftex luatex and xetex:
Hello\message{% 1xxxxxxxxxx% 2xxxxxxxxx% 3xxxxxxxxx% 4xxxxxxxxx% 5xxxxxxxxx% 6xxxxxxxxx% 7xxxxxxxxx% 8xxxxxx鹿xx% 9xxxxxxxxx% 10xxxxxxxx% 11xxxxxxxx% 12xxxxxxxx% 13xxxxxxxx% 14xxxxxxxx% 15xxxxxxxx% 16xxxxxxxx% } \bye
xetex and luatex correctly display 鹿 but luatex has this off-by-one "bug" that I still have to catch. -- luigi
Ok, not a bug. 0) xetex and luatex don't break a utf-8 sequence (or better, at least luatex should not break an utf-8 sequence in output); 1) xetex show 79 (ie. max_print_line) unicode chars in a utf-8 encoding, so in this case we have that the line with 鹿 is longer than 79 bytes; 2) luatex always shows at max 79 bytes, so in this case that line is shorter . So an applications that expect at max 79 bytes is ok with luatex, as also is ok an application that expect a valid utf-8 line that ends with "\n" . -- luigi