[Dev-luatex] Filename encoding
Hans Hagen
pragma at wxs.nl
Sun Dec 29 12:44:35 CET 2013
On 12/29/2013 10:10 AM, Khaled Hosny wrote:
> On Sun, Dec 29, 2013 at 01:07:15AM +0000, Philipp Maximilian Stephani wrote:
>> but honestly, we're not living in the 1960s any more.
>
> No, we are not, but Windows is.
I always wonder why folks need comments like this. I can come up with
linux aspects that are 1960. I more and more tend to ignore discussions
(and mails) that have this OS bad-this-or-that undertone. (And I've left
mailings because of it.) If windows was that bad, then why do desktop
builders try to mimick it. Much is a matter of getting accustomed to.
Anyway, if at some point utf16 had become the favourite (on linux) we
would be in bigger problems as it can have many zero's in strings. At
least windows could serve multiple gui languages rather early so we have
to live with some aspects (large companies wouldn't like sudden changes
and want to use programs decades). Fwiw: it's comparable to (mysql)
database content where different assumptions about what bytes represent
can give weird side effects. It's about mutual agreements.
Lua(tex) is rather neutral with respect to what bytes go into a
filename: if i save some data using an utf8 filename (from lua for
instance) i can perfectly well reload that file. Some applications will
show proper (utf8) names, others, like 'dir' in the console, will show
bytes as e.g. latin. Not much different from what one gets when one logs
into a remote machine with a different terminal setup.
Which reminds me: last week i entered an lua interactive console on
ubuntu and magically ^3 was turned into this superscript unicode 3
characters ... so, talking of a mess up ... to some extend I can
understand such default behaviour so I'll live with it.
It's cut 'n paste and assumptions of other applications that (at least
on windows) can turn something utf8 into something looking weird. It's
really not much different from typesetting an utf8 encoded document in a
tex that expects 8 bit texnansi encoding. The typeset stream looks weird
but in fact is honest utf8 visualized.
Of course we could introduce an abstract filename object (including all
these attributes that relates to file) but it's not really a solution.
Simply converting utf8 encoded filenames into utf16 doesn't work out
well because in between we use C-strings and these have this '60
properties of being zero terminated so in practice one ends up with
utf16 names clipped to length 1.
When on windows one mixes applications in a workflow it is important to
make sure that one doesn't get code page translations in the way.
Anyoing indeed, but using computers is full of annoyances. You don't
want to know what troubles we sometimes have with graphics coming from
apple infrastructures to linux infrastructure where users. Filenames is
always a bit of an issue.
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
More information about the dev-luatex
mailing list