[Dev-luatex] Filename encoding

Philipp Maximilian Stephani p.stephani2 at gmail.com
Sun Dec 29 02:07:15 CET 2013

On Donnerstag, 12. Dezember 2013 11:44:52, Javier Múgica de Rivera <
javieraritz.ribadeo at gmail.com> wrote:

2013/12/9, Philipp Stephani <p.stephani2 at gmail.com>:
> It's not that easy. For Windows, you need to convert the code points to
> UTF-16...

or pass it wthout conversion. Characters beyond the basic multilingual
plane in filenames need not be allowed

I think they should be allowed, they are normal Unicode characters, and the
BMP isn't special.

>and then use _wfopen. For OS X and Linux, you need to convert it to
> UTF-8 and then call fopen. In such cases it's often easier to only store
> one version internally (e.g. the UTF-8 version)

or just the string of code points as it is stored internally by luatex
(think it is a string of int or unsigned integers, can't remember

Not sure whether LuaTeX stores file names as code point array. This would,
however, preclude byte strings that are not valid Unicode strings (but are
legal on Unix).

>and then convert to the
> system encoding at the very edge of the program, i.e., replace all calls
> fopen by a wrapper function that fans out to fopen or _wfopen depending on
> the operating system. I tried this once with LuaTeX, but never finished
> because I really underestimated the amount of work required. fopen is
> called from dozens of places, and there are other filesystem functions to
> take care about. In essence you need to replace each call to any
> function. There are some drop-in wrappers available, e.g. GLib (
> https://developer.gnome.org/glib/2.38/glib-File-Utilities.html#g-fopen).

I thought, as you had once done, that the amount of work required was
small. In any case, this is something that ought to have been
programmed from the onset but has been left undone till now. To call
it by its name, this is a bug. Writing

\input whateveráéè.tex

and luatex no finding the file is a bug. As a Spanish speaker this is
not a serious issue for me, but I wonder how people using different
scripts, e.g. greek, russian, hebrew, etc. and using Windows manage to
get around this problem. Is it that they just don't \input files

I totally agree that this is a bug. I think not supporting Unicode when the
underlying system supports Unicode should always be treated as a bug.
Conventional wisdom says to only use ASCII characters in file names to be
portable, but honestly, we're not living in the 1960s any more.
Here is an old thread about the same problem:

dev-luatex mailing list
dev-luatex at ntg.nl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ntg.nl/pipermail/dev-luatex/attachments/20131229/7b3d7721/attachment.html>

More information about the dev-luatex mailing list