Re: [Dev-luatex] Filename encoding

7 Jan 2014

      Javier Múgica de Rivera  writes:
...
...
...
utf8 -> w_char's     (Provide some dummy solutión for values >2^16,
e.g.  c & 0xFFFF)
Don't use the dummy solution; Windows uses UTF-16.
...
w_char's -> chars via wcstombs()
No, Windows uses UTF-16. This step is unnecessary and harmful.
You know a good deal more than me about Window's internals. I thougth
that within w_char strings on Windows each character represented
itself.
wcstombs() is affected by the locale settings. According to the C
...
The opposite is true: Windows never uses locale information for filenames
(it always uses UTF-16 de facto), but the locale is used on Linux.
I supposed it was used BOTH on Windows an Linux, but that on Linux it
was never necessary due to it using UTF-8 naturally.
Linux does not use "UTF-8" naturally as far as I can tell.  It uses a
null-terminated byte sequence.  It's the job of the application to
encode that byte sequence in a manner where files will not get lost.

There might be some "external" file systems (like CD file systems or
vfat) with a translation layer for file names that assume that this byte
sequence is UTF-8 as opposed to the sequence used on the disk.

But the "native" file systems will likely be transparent, and file
systems written on a basically latin-1 system will show strange
characters in file names when used on a basically utf-8 system.

-- 
David Kastrup