Re: [Dev-luatex] Filename encoding

4 Jan 2014

      2013/12/29, David Kastrup :
...
That's what locales are for.  And Windows did not offer a working UTF8
locale last time I looked.
Indeed, it does not. From de documentation of setlocale:

The set of available languages, country/region codes, and code pages
includes all those supported by the Win32 NLS API except code pages
that require more than two bytes per character, such as UTF-7 and
UTF-8. If you provide a code page like UTF-7 or UTF-8, setlocale will
fail, returning NULL.
...
And there are also solutions.  Applications like Emacs can work with
multiple encodings well.  And that means that the primitives for
accessing files translate its internal utf-8 based encoding into the
"filename encoding".  It's perfectly fine that LuaTeX works just in
utf-8 internally, but that means that one needs a translation layer
(different for different operatings systems) that will, at least when
used from within TeX, transparently convert utf-8 to the filename
encoding.
This layer is pretty much needed for every operating system: most
operating systems, even if utf-8 based, also tend to have a "canonical"
encoding for potentially composite glyphs.
--
David Kastrup
This layer is certainly needed for a program inteded to be used
worldwide on as much operating systems as possible (or on as many as
there are volunteers to port it to).

This will work, I think, for OS that do not support UTF-8.

utf8 -> w_char's     (Provide some dummy solutión for values >2^16,
e.g.  c & 0xFFFF)
w_char's -> chars via wcstombs()

wcstombs() is affected by the locale settings. According to the C
language specification

setlocale( LC_ALL, "" ); //Sets the locale to the native environment.

In Windows this is the user-default ANSI code page obtained from the
operating system. In Windows there is also

setlocale( LC_ALL, ".OCP" ); //Sets the locale to the current OEM code
page obtained from the operating system.

I had forgotten that I had once programmed this for myself a long time
ago. Just discovered it by removing an #include I had
thought superfluous. I dint't store the string internally as utf8 but
rather ar w_char's, but the concept is the same. If fopen is to be
used the string should be transformed to the operating system locale.
This is obvious if you are using Windows. In Linux may not be that
obvious since all implementations always use UTF-8, or so I think. But
Luatex is intended to work on Windows, isn't it?

--
Javier Múgica

Re: [Dev-luatex] Filename encoding

Javier Múgica de Rivera