Re: [Dev-luatex] Very simple sample?

9 Dec 2006

      Taco:

Thank you very much. Things are now clearer. One of
my goals is to see if my Mem package may be used in
luatex and how to make use of the new features.

Javier

=======================
...
...
...
From the excerpt:
...
From now on, whenever \LUATEX\ has to open a text file, it will call
the function \type{file_opener} instead of actually opening the file
itself. It stores the returned table in its memory, and it uses the
function attached to the \type{reader} label for reading lines.
If I've understood correctly, this applies to files not
yet opened, but usually the encoding is stated inside
the file (ie, the file is already open).
That is not a problem, because *you* are the one opening the
file; it is completely under your control. Assume for a moment
if you will that all files begin a first line that contains a
statement like this:
% encoding=iso-8859-2
Here is an example of how you could extract that information
from the files, without confusing the rest of the system
(-- is a line comment that you can use in pure .lua files):
-- input:  a file object
  -- output: a string representing that file's encoding
  function find_file_encoding (f)
    -- read a line
    local line = f:read()
    -- reset the file offset  (not really needed in this case)
    f:seek("set",0)
    -- search for encoding
    -- %w == all alphanumerics,
    -- %- = a literal dash
    local fchar, lchar, match = line:find("encoding=([%w%-]+)")
    if fchar == nil then
      -- no encoding found, return a default
      return "iso-8859-1"
    else
      return match
    end
  end
You now have to hook this new function into 'file_opener', like so:
function file_opener (fname)
    local f = io.open(fname)
    if f == nil then
      return nil
    else
      local encoding = find_file_encoding(f)
      local readline = function ()
        local s = "";
        local line = f:read()
        if line == nil  then
          return nil
        else
          return latin_to_utf(line, encoding)
        end
      end
      return { reader = readline }
    end
  end
Now you know the file encoding and can make decisions based
on that information (by changing 'latin_to_utf', see below).
...
But where is the input encoding? Apparently this changes
the "Unicode" representation from 8 bits (thus limited to
the range 0-255, which is certainly latin-1) to utf-8,
without reencoding anything (say, iso greek, koi8, macos,
jis, etc.). I've googled for docs on unicode for lua but
I haven't found anything particularly useful.
An 8-bit encoding is nothing more than a mapping of 256 byte
values into unicode code points. In the simplest case, this
is an identity map, and the only difference is in file format
representation (that is what happened in my original example).
In a somewhat less trivial case, there is an array of 256 values.
Such an array could look like this:
-- table values are borrowed from ConTeXt.
  encodings = {
    ["iso-8859-2"] = { [0] =
        0x0000, 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007,
        --
        -- 240 other entries
        --
0x0159, 0x016F, 0x00FA, 0x0171, 0x00FC, 0x00FD, 0x0163, 0x02D9
  }
Having this table, we can now rewrite the 'latin_to_utf' function:
function latin_to_utf (line,enc)
    local s = "";
    for c in string.bytes(line) do
      if encodings[enc] ~= nil then
        s = s .. unicode.utf8.char(encodings[enc][c])
      else
        -- default is pass-through
        s = s .. unicode.utf8.char(c)
      end
    end
    return s
  end
The resulting lua code is in the attached .lua file (with the
full table, of course).
For 16-bit encodings etc. the remapping is more complex of course,
but this example should hopefully be enough to give you an idea
of how to approach it.
There is one big caveat I should warn about: because the current LuaTeX
is essentially a merge of Aleph and pdfTeX, you almost certainly need
an OTP to convert the resulting unicode values back to font encodings.
And that problem is why the font and hyphenation subsystems need to be
tackled next, before anything else. Which is what I'll start on next
monday.
Best,
Taco

Re: [Dev-luatex] Very simple sample?

Javier Bezos