[Dev-luatex] Very simple sample?

Javier Bezos lists at texytipografia.com
Sat Dec 9 14:25:41 CET 2006


Taco:

Thank you very much. Things are now clearer. One of
my goals is to see if my Mem package may be used in
luatex and how to make use of the new features.

Javier

=======================

>> 
>>>From the excerpt:
>> 
>>>From now on, whenever \LUATEX\ has to open a text file, it will call
>>>the function \type{file_opener} instead of actually opening the file
>>>itself. It stores the returned table in its memory, and it uses the
>>>function attached to the \type{reader} label for reading lines.
>> 
>> 
>> If I've understood correctly, this applies to files not
>> yet opened, but usually the encoding is stated inside
>> the file (ie, the file is already open).
> 
> That is not a problem, because *you* are the one opening the
> file; it is completely under your control. Assume for a moment
> if you will that all files begin a first line that contains a
> statement like this:
> 
>   % encoding=iso-8859-2
> 
> Here is an example of how you could extract that information
> from the files, without confusing the rest of the system
> (-- is a line comment that you can use in pure .lua files):
> 
>   -- input:  a file object
>   -- output: a string representing that file's encoding
>   function find_file_encoding (f)
>     -- read a line
>     local line = f:read()
>     -- reset the file offset  (not really needed in this case)
>     f:seek("set",0)
>     -- search for encoding
>     -- %w == all alphanumerics,
>     -- %- = a literal dash
>     local fchar, lchar, match = line:find("encoding=([%w%-]+)")
>     if fchar == nil then
>       -- no encoding found, return a default
>       return "iso-8859-1"
>     else
>       return match
>     end
>   end
> 
> You now have to hook this new function into 'file_opener', like so:
> 
>   function file_opener (fname)
>     local f = io.open(fname)
>     if f == nil then
>       return nil
>     else
>       local encoding = find_file_encoding(f)
>       local readline = function ()
>         local s = "";
>         local line = f:read()
>         if line == nil  then
>           return nil
>         else
>           return latin_to_utf(line, encoding)
>         end
>       end
>       return { reader = readline }
>     end
>   end
> 
> Now you know the file encoding and can make decisions based
> on that information (by changing 'latin_to_utf', see below).
> 
>> But where is the input encoding? Apparently this changes
>> the "Unicode" representation from 8 bits (thus limited to
>> the range 0-255, which is certainly latin-1) to utf-8,
>> without reencoding anything (say, iso greek, koi8, macos,
>> jis, etc.). I've googled for docs on unicode for lua but
>> I haven't found anything particularly useful. 
> 
> An 8-bit encoding is nothing more than a mapping of 256 byte
> values into unicode code points. In the simplest case, this
> is an identity map, and the only difference is in file format
> representation (that is what happened in my original example).
> 
> In a somewhat less trivial case, there is an array of 256 values.
> Such an array could look like this:
> 
>   -- table values are borrowed from ConTeXt.
>   encodings = {
>     ["iso-8859-2"] = { [0] =
>         0x0000, 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007,
>         --
>         -- 240 other entries
>         --
> 0x0159, 0x016F, 0x00FA, 0x0171, 0x00FC, 0x00FD, 0x0163, 0x02D9
>   }
> 
> 
> Having this table, we can now rewrite the 'latin_to_utf' function:
> 
>   function latin_to_utf (line,enc)
>     local s = "";
>     for c in string.bytes(line) do
>       if encodings[enc] ~= nil then
>         s = s .. unicode.utf8.char(encodings[enc][c])
>       else
>         -- default is pass-through
>         s = s .. unicode.utf8.char(c)
>       end
>     end
>     return s
>   end
> 
> The resulting lua code is in the attached .lua file (with the
> full table, of course).
> 
> For 16-bit encodings etc. the remapping is more complex of course,
> but this example should hopefully be enough to give you an idea
> of how to approach it.
> 
> There is one big caveat I should warn about: because the current LuaTeX
> is essentially a merge of Aleph and pdfTeX, you almost certainly need
> an OTP to convert the resulting unicode values back to font encodings.
> And that problem is why the font and hyphenation subsystems need to be
> tackled next, before anything else. Which is what I'll start on next
> monday.
> 
> Best,
> 
> Taco
>


More information about the dev-luatex mailing list