2013/12/9, Philipp Stephani
It's not that easy. For Windows, you need to convert the code points to UTF-16...
or pass it wthout conversion. Characters beyond the basic multilingual plane in filenames need not be allowed.
and then use _wfopen. For OS X and Linux, you need to convert it to UTF-8 and then call fopen. In such cases it's often easier to only store one version internally (e.g. the UTF-8 version)
or just the string of code points as it is stored internally by luatex (think it is a string of int or unsigned integers, can't remember now).
and then convert to the system encoding at the very edge of the program, i.e., replace all calls to fopen by a wrapper function that fans out to fopen or _wfopen depending on the operating system. I tried this once with LuaTeX, but never finished because I really underestimated the amount of work required. fopen is called from dozens of places, and there are other filesystem functions to take care about. In essence you need to replace each call to any filesystem function. There are some drop-in wrappers available, e.g. GLib ( https://developer.gnome.org/glib/2.38/glib-File-Utilities.html#g-fopen).
I thought, as you had once done, that the amount of work required was small. In any case, this is something that ought to have been programmed from the onset but has been left undone till now. To call it by its name, this is a bug. Writing \input whateveráéè.tex and luatex no finding the file is a bug. As a Spanish speaker this is not a serious issue for me, but I wonder how people using different scripts, e.g. greek, russian, hebrew, etc. and using Windows manage to get around this problem. Is it that they just don't \input files?