U luatex/src/texk/web2c/luatexdir/lua/luatex.c Checked out revision 179.
Hello, I'm trying to use \directlua, but until now all may attempts failed. I begun with: \documentclass{book} \begin{document} \directlua0{tex.print("\uppercase{hello}");} \end{document} and then wrapped \directlua with every combination possible of \texscantokens, \luaescapestring, \scantokens and so on, but nothing seems to work (no errors, but I get "uppercase" printed, perhaps with dashes and quotes, and of course "hello" still in lower case). Please, could you provide an example showing what I should do. Are \directlua and tex.print the right commands to use? Thanks Javier
Javier Bezos wrote:
Hello,
I'm trying to use \directlua, but until now all may attempts failed. I begun with:
\documentclass{book} \begin{document} \directlua0{tex.print("\uppercase{hello}");} \end{document}
and then wrapped \directlua with every combination possible of \texscantokens, \luaescapestring, \scantokens and so on, but nothing seems to work (no errors, but I get "uppercase" printed, perhaps with dashes and quotes, and of course "hello" still in lower case).
Please, could you provide an example showing what I should do. Are \directlua and tex.print the right commands to use?
Yes, but tex interprets commands inside the \directlua argument and on top of that lua interprets \ in strings, so you need a way to get two backslashes in the tex.print argument This is the simplest solution: \let\\\relax \directlua0{tex.print("\\uppercase{hello}");} But you may need something else in LaTeX, where \\ has built-in semantics. Best, Taco
Taco:
This is the simplest solution:
\let\\\relax \directlua0{tex.print("\\uppercase{hello}");}
:-/ Too ad hoc, but it helped, as I understood what's going on. So I tried: \directlua0\expandafter{% \detokenize{tex.print("\\section{Hola $\\sin a_0^2$}");}} Seems to work. Javier
Another question:
How can I read arbitrary TeX input? Or is this part
still unfinished? Currently, LuaTeX only accepts
UTF-8 input, right? But let's assume we have:
Javier Bezos wrote:
Another question:
How can I read arbitrary TeX input? Or is this part still unfinished? Currently, LuaTeX only accepts UTF-8 input, right? But let's assume we have:
\arabic{<iso arabic text>}
that is kind of impossible, unless you hook lua code in \arabic or let \arabic insert info that then can be used while preprocesing the lines preprocessing (at all levels of tex) is under development; by the end of this year / begin next year i/o will be stable ; around eurotex open type fonts + manipulation of fonts will be available
ie, two input encodings in the same line. I don't still understand how encodings work.
luatex takes only utf, and if you want something else, one can recode each input line using a call back function to do the recoding (or use an external prog to preprocess the input), but mixed input encodings will demand adaption of the macro's used for (eg arab) pdftex 1.* will be the pdftex that deals with 8 bit input, pdftex 2.* will be the utf variant ; user visible tex code for version 2.0+ quite certainly will be different at the lower level Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hi Javier (and Hans),
luatex takes only utf, and if you want something else, one can recode each input line using a call back function to do the recoding (or use an external prog to preprocess the input), but mixed input encodings will demand adaption of the macro's used for (eg arab)
Of course there are a few things not implemented yet (in particular, we plan to have advanced support for in-line encoding/transliteration changes by making it possible to intercept the token builder, but that is not there yet). However, the practical problem is not so much unfinished things in the executable, but missing documentation and Lua script code. I'll be writing a section on input encodings for the manual tomorrow, and that should help clear up things. Please hang on for a bit longer until that is finished. Best, Taco
Hi Javier, Taco Hoekwater wrote:
Lua script code. I'll be writing a section on input encodings for the manual tomorrow, and that should help clear up things. Please hang on for a bit longer until that is finished.
Attached is a provisional excerpt of the coming book on LuaTeX. It is ConTeXt source, but it should not be hard to get it to compile under LaTeX or plain if you don't have ConTeXt installed. An easy way to solve the luacode-in-texcode chaos is to put the bulk of the lua code in a separate file, and only have e.g. \directlua 0 { dofile("demo.lua") } in your document source. Anyway, i hope it makes sense to you. Best, Taco
Taco:
Attached is a provisional excerpt of the coming book on LuaTeX. It is ConTeXt source, but it should not be hard to get it to compile under LaTeX or plain if you don't have ConTeXt installed.
I've ConTeXt installed (in fact, the full TeX Live 2005). So, no problem.
From the excerpt:
From now on, whenever \LUATEX\ has to open a text file, it will call the function \type{file_opener} instead of actually opening the file itself. It stores the returned table in its memory, and it uses the function attached to the \type{reader} label for reading lines.
If I've understood correctly, this applies to files not yet opened, but usually the encoding is stated inside the file (ie, the file is already open).
function latin_to_utf (line) local s = ""; for c in string.bytes(line) do s = s .. unicode.utf8.char(c) end return s end
But where is the input encoding? Apparently this changes the "Unicode" representation from 8 bits (thus limited to the range 0-255, which is certainly latin-1) to utf-8, without reencoding anything (say, iso greek, koi8, macos, jis, etc.). I've googled for docs on unicode for lua but I haven't found anything particularly useful. Javier
Hi Javier, Javier Bezos wrote:
Taco:
From the excerpt:
From now on, whenever \LUATEX\ has to open a text file, it will call the function \type{file_opener} instead of actually opening the file itself. It stores the returned table in its memory, and it uses the function attached to the \type{reader} label for reading lines.
If I've understood correctly, this applies to files not yet opened, but usually the encoding is stated inside the file (ie, the file is already open).
That is not a problem, because *you* are the one opening the file; it is completely under your control. Assume for a moment if you will that all files begin a first line that contains a statement like this: % encoding=iso-8859-2 Here is an example of how you could extract that information from the files, without confusing the rest of the system (-- is a line comment that you can use in pure .lua files): -- input: a file object -- output: a string representing that file's encoding function find_file_encoding (f) -- read a line local line = f:read() -- reset the file offset (not really needed in this case) f:seek("set",0) -- search for encoding -- %w == all alphanumerics, -- %- = a literal dash local fchar, lchar, match = line:find("encoding=([%w%-]+)") if fchar == nil then -- no encoding found, return a default return "iso-8859-1" else return match end end You now have to hook this new function into 'file_opener', like so: function file_opener (fname) local f = io.open(fname) if f == nil then return nil else local encoding = find_file_encoding(f) local readline = function () local s = ""; local line = f:read() if line == nil then return nil else return latin_to_utf(line, encoding) end end return { reader = readline } end end Now you know the file encoding and can make decisions based on that information (by changing 'latin_to_utf', see below).
But where is the input encoding? Apparently this changes the "Unicode" representation from 8 bits (thus limited to the range 0-255, which is certainly latin-1) to utf-8, without reencoding anything (say, iso greek, koi8, macos, jis, etc.). I've googled for docs on unicode for lua but I haven't found anything particularly useful.
An 8-bit encoding is nothing more than a mapping of 256 byte values into unicode code points. In the simplest case, this is an identity map, and the only difference is in file format representation (that is what happened in my original example). In a somewhat less trivial case, there is an array of 256 values. Such an array could look like this: -- table values are borrowed from ConTeXt. encodings = { ["iso-8859-2"] = { [0] = 0x0000, 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007, -- -- 240 other entries -- 0x0159, 0x016F, 0x00FA, 0x0171, 0x00FC, 0x00FD, 0x0163, 0x02D9 } Having this table, we can now rewrite the 'latin_to_utf' function: function latin_to_utf (line,enc) local s = ""; for c in string.bytes(line) do if encodings[enc] ~= nil then s = s .. unicode.utf8.char(encodings[enc][c]) else -- default is pass-through s = s .. unicode.utf8.char(c) end end return s end The resulting lua code is in the attached .lua file (with the full table, of course). For 16-bit encodings etc. the remapping is more complex of course, but this example should hopefully be enough to give you an idea of how to approach it. There is one big caveat I should warn about: because the current LuaTeX is essentially a merge of Aleph and pdfTeX, you almost certainly need an OTP to convert the resulting unicode values back to font encodings. And that problem is why the font and hyphenation subsystems need to be tackled next, before anything else. Which is what I'll start on next monday. Best, Taco
Taco: Thank you very much. Things are now clearer. One of my goals is to see if my Mem package may be used in luatex and how to make use of the new features. Javier =======================
From the excerpt:
From now on, whenever \LUATEX\ has to open a text file, it will call the function \type{file_opener} instead of actually opening the file itself. It stores the returned table in its memory, and it uses the function attached to the \type{reader} label for reading lines.
If I've understood correctly, this applies to files not yet opened, but usually the encoding is stated inside the file (ie, the file is already open).
That is not a problem, because *you* are the one opening the file; it is completely under your control. Assume for a moment if you will that all files begin a first line that contains a statement like this:
% encoding=iso-8859-2
Here is an example of how you could extract that information from the files, without confusing the rest of the system (-- is a line comment that you can use in pure .lua files):
-- input: a file object -- output: a string representing that file's encoding function find_file_encoding (f) -- read a line local line = f:read() -- reset the file offset (not really needed in this case) f:seek("set",0) -- search for encoding -- %w == all alphanumerics, -- %- = a literal dash local fchar, lchar, match = line:find("encoding=([%w%-]+)") if fchar == nil then -- no encoding found, return a default return "iso-8859-1" else return match end end
You now have to hook this new function into 'file_opener', like so:
function file_opener (fname) local f = io.open(fname) if f == nil then return nil else local encoding = find_file_encoding(f) local readline = function () local s = ""; local line = f:read() if line == nil then return nil else return latin_to_utf(line, encoding) end end return { reader = readline } end end
Now you know the file encoding and can make decisions based on that information (by changing 'latin_to_utf', see below).
But where is the input encoding? Apparently this changes the "Unicode" representation from 8 bits (thus limited to the range 0-255, which is certainly latin-1) to utf-8, without reencoding anything (say, iso greek, koi8, macos, jis, etc.). I've googled for docs on unicode for lua but I haven't found anything particularly useful.
An 8-bit encoding is nothing more than a mapping of 256 byte values into unicode code points. In the simplest case, this is an identity map, and the only difference is in file format representation (that is what happened in my original example).
In a somewhat less trivial case, there is an array of 256 values. Such an array could look like this:
-- table values are borrowed from ConTeXt. encodings = { ["iso-8859-2"] = { [0] = 0x0000, 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007, -- -- 240 other entries -- 0x0159, 0x016F, 0x00FA, 0x0171, 0x00FC, 0x00FD, 0x0163, 0x02D9 }
Having this table, we can now rewrite the 'latin_to_utf' function:
function latin_to_utf (line,enc) local s = ""; for c in string.bytes(line) do if encodings[enc] ~= nil then s = s .. unicode.utf8.char(encodings[enc][c]) else -- default is pass-through s = s .. unicode.utf8.char(c) end end return s end
The resulting lua code is in the attached .lua file (with the full table, of course).
For 16-bit encodings etc. the remapping is more complex of course, but this example should hopefully be enough to give you an idea of how to approach it.
There is one big caveat I should warn about: because the current LuaTeX is essentially a merge of Aleph and pdfTeX, you almost certainly need an OTP to convert the resulting unicode values back to font encodings. And that problem is why the font and hyphenation subsystems need to be tackled next, before anything else. Which is what I'll start on next monday.
Best,
Taco
Javier Bezos wrote:
Taco:
Thank you very much. Things are now clearer. One of my goals is to see if my Mem package may be used in luatex and how to make use of the new features.
That is great, please keep us posted. Being forced to write these tutorial-like things is actually very good, so do not feel embarassed about asking clarifications. Best, Taco
Taco (and Hans):
Of course there are a few things not implemented yet (in particular, we plan to have advanced support for in-line encoding/transliteration changes by making it possible to intercept the token builder, but that is not there yet).
ok, thank you. I've begun to make tests combining ocp's and lua. The tex file is: ============================= \documentclass{book} \def\lua#1{\directlua0\expandafter{\detokenize{#1}}} \ocp\luaocp=lua \ocplist\lualist=\addbeforeocplist 1 \luaocp \nullocplist \def\dolua#1{\lua{ print('(#1)'); tex.sprint('(#1)'); }} \begin{document} //{\pushocplist\lualist text and text}// \end{document} ============================= and the otp file (lua.otp) is: ============================= input: 1; output: 1; expressions: . => "\dolua{" \1 "}"; ============================== which just wraps every letter inside \dolua{ }. However, the letters are silently ignored (but "print" prints it in the console, so lua is working). I presume the problem is "tex.print" is like a line in the tex file, while ocp's are applied after expansion, so when \dolua is executed it's too late and the line is sent to nowhere. I'm aware the road map says the third stage will implement "token filtering (aka Translation Processes)". I'm just commenting on my tests, just in case they are useful. Javier PS. By the way, I'm using luatex on Windows XP with TeXLive 2005 and a modified latex.ltx.
Javier Bezos wrote:
which just wraps every letter inside \dolua{ }. However, the letters are silently ignored (but "print" prints it in the console, so lua is working). I presume the problem is "tex.print" is like a line in the tex file, while ocp's are applied after expansion, so when \dolua is executed it's too late and the line is sent to nowhere.
That is roughly what happens, yes. I have to look into this at some point, because silently disappearing output is a bit unfriendly, but the precise interaction between otp processing and input states is not the easiest bit of TeX to comprehend, so it may take a while. Best, Taco
Taco:
which just wraps every letter inside \dolua{ }. However, the letters are silently ignored (but "print" prints it in the console, so lua is working). I presume the problem is "tex.print" is like a line in the tex file, while ocp's are applied after expansion, so when \dolua is executed it's too late and the line is sent to nowhere.
That is roughly what happens, yes. I have to look into this at some point, because silently disappearing output is a bit unfriendly, but the precise interaction between otp processing and input states is not the easiest bit of TeX to comprehend, so it may take a while.
OK. What I was wondering was if the whole OCP mechanim could be replaced by lua scripts. Since lua can be embedded in TeX, the transformations would be under the control of TeX and not as external files, unmodifiable at run time. So, I intented to write a OTP passing directly the OCP buffer to lua. Of course, that has some disadvantages since the result may need further expansion to reprocess it. Or perhaps not? This is what I liked to investigate. Javier
Javier Bezos wrote:
OK. What I was wondering was if the whole OCP mechanim could be replaced by lua scripts. Since lua can be
On the todo list is an item about writing lua scripts to interpret OTP files directly, but that is at least a few months away. Best, Taco
Taco:
OK. What I was wondering was if the whole OCP mechanim could be replaced by lua scripts. Since lua can be
On the todo list is an item about writing lua scripts to interpret OTP files directly, but that is at least a few months away.
And about applying lua scripts at the point where OCPs are applied? Remember one of the goals of OCPs is to apply transformations _after_ expansion, so that: \def\charf{f} \def\chari{i} \charf\chari could be properly handled as fi. This is very important in "contextual" scripts like Arabic or Devanaghari. BTW, I was even wondering if that would allow fixing the \string and \char issues (I would say bugs) in the OCP mechanism. Javier
Javier Bezos wrote:
Taco:
OK. What I was wondering was if the whole OCP mechanim could be replaced by lua scripts. Since lua can be
On the todo list is an item about writing lua scripts to interpret OTP files directly, but that is at least a few months away.
And about applying lua scripts at the point where OCPs are applied?
Yes, of course. It would be pointless otherwise. Taco
Javier Bezos wrote:
Taco:
which just wraps every letter inside \dolua{ }. However, the letters are silently ignored (but "print" prints it in the console, so lua is working). I presume the problem is "tex.print" is like a line in the tex file, while ocp's are applied after expansion, so when \dolua is executed it's too late and the line is sent to nowhere.
That is roughly what happens, yes. I have to look into this at some point, because silently disappearing output is a bit unfriendly, but the precise interaction between otp processing and input states is not the easiest bit of TeX to comprehend, so it may take a while.
OK. What I was wondering was if the whole OCP mechanim could be replaced by lua scripts. Since lua can be embedded in TeX, the transformations would be under the control of TeX and not as external files, unmodifiable at run time. So, I intented to write a OTP passing directly the OCP buffer to lua. Of course, that has some disadvantages since the result may need further expansion to reprocess it. Or perhaps not? This is what I liked to investigate.
lua callbacks will be avialable for each stage: input, after expansion, at various stages of list building, par building, page building, shipout etc Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
participants (4)
-
Hans Hagen
-
Javier Bezos
-
root@aanhet.net
-
Taco Hoekwater