Hi, this is a long thread and too many things are discussed at the same time so I will need some time to read and understand what is going on. My first thought is that some small modifications to \showlist and \showbox will help a lot. It's easy to write additional info like dimensions of each item in the list, or in case of characters the filename of a tfm with fontsize (or we may write the dimensions of each char as Hans suggested, but this is an overkill IMHO). My feeling is that we need to work out the specification and format of the `` node list'' first. In the first step, I would prefer to have only node-specific things, eg only what comes out after a box construction. I also got a similar request: to provide a primitive that writes out the content of a box and another primitive to re-construct that box back from the output. We can start with this and make further extensions later on. At the moment I cannot see clearly what is needed, but I am willing to write some extensions so that we can experiment with to see what is really needed and perhaps change what have been done. Thanh On Tue, Sep 20, 2005 at 11:58:08AM +0200, Karel Skoupy wrote:
Hi all,
as Hans has already mentioned, my concern with luatex is to have some interface/protocol for formatting the TeX stuff externally.
Without going to details now, I'm interested in alternative algorithms for formatting not only paragraphs, but the whole stream. For TUG 2005 I have written a prototype which doesn't use any TeX code at all (it just parasities on ADvi code for getting some metric information and showing the results). For long time I planned to make a whole new system from scratch, but for several reasons, that was reconsidered and Hans proposed a way (plugin mechanism for external engine), how to cooperate with TeX, so TeX could benefit from the new algorithm and I can concentrate on the core stuff.
So basically I need a stream of (character) boxes, glues, penalties, ... (is there a simple unambiguous notion for all that?) in a preprocessed form (I don't care about input and macro handling) plus some parameters (standard paragraph breaking parameters and the new special ones) and I will return a stream of fixed boxes.
'I' will often mean 'the engine' depending on the context :-)
In the first stage, I won't need lua (or any changes to TeX) at all. I plan to use \showlists for my input stream and to generate a standard TeX input file for reading the result back. Of course, it won't be so simple, there will be some macro programming and trickery, which will make the whole thing complicated, fragile, unreliable, and inefficient for real use. Therefore some hooks from the actively developed TeX will be probably useful for making the cooperation of TeX and the external engine smooth. It might use lua or not, we will see, in any case I would like to keep the plugin support generic and (complete but) minimal.
I will now list the aspects of the communications between TeX and the engine which I have thought of so far. I will be glad if you can just think about it for the moment and give me some feedback if you will.
* single paragraph stuff
I need: (1) complete representation of all the stuff which is to be returned formatted (2) sizes of all the objects which are involved in formatting (3) properties which influence the formatting (breakable, discardable, ...)
It seems that the standard output of \showlists (or \showbox) will mostly do. (1) is fulfilled I guess (the returned input needs to be only slightly modified to fit TeX).
(2) is little bit tricky, because for the characters I get only an id of the font. So I will need to know the exact reference to a real font to get the metrics information. This can be learned by eg. \show\tenrm. But of course it is not know in advance what fonts are used in the paragraph, so either all fonts can be listed at the beginning -- but where to get the list of all font definitions, and the definitions can actually change in the middle of the paragraph -- or I can make a first pass, collect the font ids and ask for them in the second pass. It will be bit tricky and won't be reliable due to redefinitions (I can also change the current id using \let and lose the old id (still used in the log), right?), so it will be OK for experimenting but for a real version, I will need a better support from TeX.
(3) is implicit, right?
* stream of paragraphs
I can need even the whole chapter, because I want to treat - shapes and layouts, which are relative to page and not to a particular paragraph - pagination, floats placement
For the basic experimenting I can redefine \par to something like \hfil\break\indent but it will restrict all kinds of things which can happen between the paragraphs (in vertical mode). Of course, the whole thing will never be compatible to TeX, because TeX expect after \par that the last paragraph was formatted and placed on the vertical list. So it will be responsibility of the user/macro-programmer to bear the consequences of using the alternative mechanism. Nevertheless, the consequences should be as small as possible.
So for the prototyping I can redefine \par or perhaps I can store the whole paragraphs in infinite hboxes (redefining \hsize?) or maybe I can use some \specials for tagging, but for the production version, this will be a very tricky part. Not so much for the engine, but mainly on the TeX side. It should be of a great concern for people who would want to use the new algorithms in their systems (Hans?), (after those ideas are first tested by a prototype :-).
* passing the parameters specific to the new algorithms
- layouts, shapes - maybe others, like weights for resolving paragraph contra page breaking
This will be a new thing so I hope that there is no compatibility burden.
* hyphenation
It will be a lot of additional work, but I think that I should handle it locally. There are two reasons:
(1) the protocol for failing and getting the list with new discretionaries (TeX's 2nd pass) for every individual paragraph would be extremely complicated, in the end it might be more difficult than handling it locally.
(2) TeX's hyphenation mechanism is IMHO one of the crappiest parts of TeX. I mean the way how the (non)ligatures are screwed up for discretionaries which are not used in the end. So if it is handled locally, it will be IMO simpler and more correct. There are also some research results concerning hyphenation, which are not implemented in TeX, because it would be too complicated.
At the first stage, I'll omit the hyphenation completelly.
At the moment, I don't remember anything else. I'm looking forward for your feedback.
--ksk _______________________________________________ Dev-luatex mailing list Dev-luatex@ntg.nl http://www.ntg.nl/mailman/listinfo/dev-luatex