Hi,
I have already solved the command line issue, understanding that it is
inapropriate for Luatex to try to guess anything in this respect. But
now I have run into another problem, which has nothing to do with the
command line.
If, within a utf-8 file, say test.tex, I write
\input Canción.tex
I get this output:
>luaplainJA asd.tex
This is LuaTeX, Version beta-0.76.0-2013052306 (rev 4627)
(./test.tex
! I can't find file `Canci├│n.tex'.
l.1 \input{Canci├│n.tex}
Please type another input file name:
Luatex finds the file if I name it the 8-char representation of
Canción.tex: Canción.tex. I wondered what would happen if I write a
filename using characters not from my OS default locale, so I created
the file Ωδη.tex and within test.tex I wrote
\input Ωδη.tex
But the problem persists. The file is found if I name it Ωδη.tex,
the 8-char representation of Ωδη.tex.
This is something different from luatex assuming all its input is
utf-8. This is assuming that the underlying OS uses utf-8 for
filenames.
I searched into the code and found the relevant function, the one that
parses the argument of \input. It is in filename.w. The doc says:
@ In order to isolate the system-dependent aspects of file names, the
@^system dependencies@> system-independent parts of \TeX\ are
expressed in terms
of three system-dependent procedures called |begin_name|,
|more_name|, and |end_name|. In essence, if the user-specified
characters of the file name are $c_1\ldots c_n$, the
system-independent driver program does the operations
$$|begin_name|;\,|more_name|(c_1);\,\ldots\,;\,|more_name|(c_n); \,|end_name|.$$
The function scan_file_name includes
void scan_file_name(void){
[...]
if (cur_chr > 127) {
unsigned char *bytes;
unsigned char *thebytes;
thebytes = uni2str((unsigned) cur_chr);
bytes = thebytes;
while (*bytes) {
if (!more_name(*bytes))
break;
bytes++;
}
xfree(thebytes);
}
[...]
}
static boolean more_name(ASCII_code c)
{
[...]
append_char(c); /* contribute |c| to the current string */
[...]
}
The C-standard does not have a wide-character equivalent to fopen, but
I suppose all current compilers have it. Visual Studio's is _wfopen
(arguments to _wfopen are wide-character strings. _wfopen and fopen
behave identically otherwise). Wouldn't it be easier to to use that
function and avoid breaking each character into its utf-8 8-char
representation, presuming fopen/OS will understand it properly?