2013/12/29, David Kastrup
That's what locales are for. And Windows did not offer a working UTF8 locale last time I looked.
Indeed, it does not. From de documentation of setlocale: The set of available languages, country/region codes, and code pages includes all those supported by the Win32 NLS API except code pages that require more than two bytes per character, such as UTF-7 and UTF-8. If you provide a code page like UTF-7 or UTF-8, setlocale will fail, returning NULL.
And there are also solutions. Applications like Emacs can work with multiple encodings well. And that means that the primitives for accessing files translate its internal utf-8 based encoding into the "filename encoding". It's perfectly fine that LuaTeX works just in utf-8 internally, but that means that one needs a translation layer (different for different operatings systems) that will, at least when used from within TeX, transparently convert utf-8 to the filename encoding.
This layer is pretty much needed for every operating system: most operating systems, even if utf-8 based, also tend to have a "canonical" encoding for potentially composite glyphs.
-- David Kastrup
This layer is certainly needed for a program inteded to be used
worldwide on as much operating systems as possible (or on as many as
there are volunteers to port it to).
This will work, I think, for OS that do not support UTF-8.
utf8 -> w_char's (Provide some dummy solutión for values >2^16,
e.g. c & 0xFFFF)
w_char's -> chars via wcstombs()
wcstombs() is affected by the locale settings. According to the C
language specification
setlocale( LC_ALL, "" ); //Sets the locale to the native environment.
In Windows this is the user-default ANSI code page obtained from the
operating system. In Windows there is also
setlocale( LC_ALL, ".OCP" ); //Sets the locale to the current OEM code
page obtained from the operating system.
I had forgotten that I had once programmed this for myself a long time
ago. Just discovered it by removing an #include