Hi, since I am off for Easter, I just want to spread the implementation idea for dealing with arbitrary long lines, so that I can get some feedback before I get to actual coding. Since LuaTeX has its own complications to take care off with regard to utf8, I actually would want to prepare this as a LuaTeX patch. Backporting to PDFTeX should be straightforward. I'll also try thinking about some input encoding implementation place: the process_input_buffer callback has several drawbacks: for one thing, it can't expect to get complete lines if I prepare lines peacemeal. So it would probably have to get another argument "partial". For another, the line end needs to get detected in the first place. But line detection _depends_ on the encoding, at least if we are talking about utf16 flavors. And in particular if process_input_buffer gets partial lines, I'd like it to get lines that don't contain partial utf-8 sequences. So I do see a need for some more stuff specific to LuaTeX, and I don't want to design something that would be hard to port to it. So much for some background. The basic design would be the following: buffer happens to be a single buffer. This could probably be chosen with a total size of 32k (naturally, people will disagree here, but that will stay configurable). Before reading material from a file, TeX will _start_ by placing the \endlinechar before any other material, as a fixed 4-byte long utf-8 sequence (alternatively, it may be stored as part of the file data structure): its setting at the time of reading the file needs to get preserved until we finally reach the end of the file. Then the next line gets read into buffer, either until end of line is reached, or until the buffer read limit (2k sounds reasonable, could conceivably be configurable since it influences things like maximum size of \csname ...\endcsname) is hit. When processing material, we usually check for end of line condition, anyway. When those checks turn out true, we do another check for "really" end of line or just end of the buffered part. If it is just the end of the buffered part, then sufficient material from the end of the buffered part is copied to the front of the buffered part, more stuff is read in according to the buffer read limit (and possibly tacking on the buffered end line character at the end), and we resume. "sufficient material" means the maximum of a) 40 characters (probably 160 bytes will do) of error context for the input line context part of error messages. b) if we are in the middle of scanning a control sequence name, the beginning of the control sequence. If this copying process would not result in any more available space (making it possible to actually read in new material), we get the dreaded buffer overflow. Basically this concept appears sound to me. It would, however, be strictly restricted to file reading. Things like \scantokens and \csname (which also use the buffer) would still be required to have their argument fit in one piece. But I guess that the file reading should cover the largest problem area. Something like that. I hope I'll have something to show before EuroTeX. But as I said, I am away without net access (and without computer) for the next week. All the best, David -- David Kastrup