On Donnerstag, 12. Dezember 2013 11:44:52, Javier Múgica de Rivera < javieraritz.ribadeo@gmail.com> wrote: 2013/12/9, Philipp Stephani
It's not that easy. For Windows, you need to convert the code points to UTF-16...
or pass it wthout conversion. Characters beyond the basic multilingual plane in filenames need not be allowed I think they should be allowed, they are normal Unicode characters, and the BMP isn't special.
and then use _wfopen. For OS X and Linux, you need to convert it to UTF-8 and then call fopen. In such cases it's often easier to only store one version internally (e.g. the UTF-8 version)
or just the string of code points as it is stored internally by luatex (think it is a string of int or unsigned integers, can't remember now). Not sure whether LuaTeX stores file names as code point array. This would, however, preclude byte strings that are not valid Unicode strings (but are legal on Unix).
and then convert to the system encoding at the very edge of the program, i.e., replace all calls to fopen by a wrapper function that fans out to fopen or _wfopen depending on the operating system. I tried this once with LuaTeX, but never finished because I really underestimated the amount of work required. fopen is called from dozens of places, and there are other filesystem functions to take care about. In essence you need to replace each call to any filesystem function. There are some drop-in wrappers available, e.g. GLib ( https://developer.gnome.org/glib/2.38/glib-File-Utilities.html#g-fopen).
I thought, as you had once done, that the amount of work required was small. In any case, this is something that ought to have been programmed from the onset but has been left undone till now. To call it by its name, this is a bug. Writing \input whateveráéè.tex and luatex no finding the file is a bug. As a Spanish speaker this is not a serious issue for me, but I wonder how people using different scripts, e.g. greek, russian, hebrew, etc. and using Windows manage to get around this problem. Is it that they just don't \input files I totally agree that this is a bug. I think not supporting Unicode when the underlying system supports Unicode should always be treated as a bug. Conventional wisdom says to only use ASCII characters in file names to be portable, but honestly, we're not living in the 1960s any more. Here is an old thread about the same problem: http://tug.org/pipermail/tex-live/2011-May/029059.html _______________________________________________ dev-luatex mailing list dev-luatex@ntg.nl http://www.ntg.nl/mailman/listinfo/dev-luatex