[Dev-luatex] plugin for external formatting

20 Sep 2005

      Hi all,

as Hans has already mentioned, my concern with luatex is to have some
interface/protocol for formatting the TeX stuff externally.

Without going to details now, I'm interested in alternative algorithms for
formatting not only paragraphs, but the whole stream. For TUG 2005 I have
written a prototype which doesn't use any TeX code at all (it just
parasities on ADvi code for getting some metric information and showing the
results). For long time I planned to make a whole new system from scratch,
but for several reasons, that was reconsidered and Hans proposed a way
(plugin mechanism for external engine), how to cooperate with TeX, so TeX
could benefit from the new algorithm and I can concentrate on the core
stuff.

So basically I need a stream of (character) boxes, glues, penalties, ...
(is there a simple unambiguous notion for all that?) in a preprocessed
form (I don't care about input and macro handling) plus some parameters
(standard paragraph breaking parameters and the new special ones) and I
will return a stream of fixed boxes.

'I' will often mean 'the engine' depending on the context :-)

In the first stage, I won't need lua (or any changes to TeX) at all. I
plan to use \showlists for my input stream and to generate a standard
TeX input file for reading the result back. Of course, it won't be so
simple, there will be some macro programming and trickery, which will
make the whole thing complicated, fragile, unreliable, and inefficient for
real use. Therefore some hooks from the actively developed TeX will be
probably useful for making the cooperation of TeX and the external
engine smooth. It might use lua or not, we will see, in any case I would
like to keep the plugin support generic and (complete but) minimal.

I will now list the aspects of the communications between TeX and the
engine which I have thought of so far. I will be glad if you can just think
about it for the moment and give me some feedback if you will.

* single paragraph stuff

I need:
(1) complete representation of all the stuff which is to be returned formatted
(2) sizes of all the objects which are involved in formatting
(3) properties which influence the formatting (breakable, discardable, ...)

It seems that the standard output of \showlists (or \showbox) will mostly do.
(1) is fulfilled I guess (the returned input needs to be only slightly
modified to fit TeX).

(2) is little bit tricky, because for the characters I get only an id of
the font. So I will need to know the exact reference to a real font to get
the metrics information. This can be learned by eg. \show\tenrm. But of
course it is not know in advance what fonts are used in the paragraph, so
either all fonts can be listed at the beginning -- but where to get the list of
all font definitions, and the definitions can actually change in the middle
of the paragraph -- or I can make a first pass, collect the font ids and
ask for them in the second pass. It will be bit tricky and won't be
reliable due to redefinitions (I can also change the current id using \let
and lose the old id (still used in the log), right?), so it will be OK for
experimenting but for a real version, I will need a better support from
TeX.

(3) is implicit, right?

* stream of paragraphs

I can need even the whole chapter, because I want to treat
- shapes and layouts, which are relative to page and not to a particular
  paragraph
- pagination, floats placement

For the basic experimenting I can redefine \par to something like
\hfil\break\indent but it will restrict all kinds of things which can
happen between the paragraphs (in vertical mode). Of course, the whole
thing will never be compatible to TeX, because TeX expect after \par that
the last paragraph was formatted and placed on the vertical list. So it
will be responsibility of the user/macro-programmer to bear the
consequences of using the alternative mechanism. Nevertheless, the
consequences should be as small as possible.

So for the prototyping I can redefine \par or perhaps I can store the whole
paragraphs in infinite hboxes (redefining \hsize?) or maybe I can use some
\specials for tagging, but for the production version, this will be a very
tricky part. Not so much for the engine, but mainly on the TeX side. It
should be of a great concern for people who would want to use the new
algorithms in their systems (Hans?), (after those ideas are first tested by
a prototype :-).

* passing the parameters specific to the new algorithms

- layouts, shapes
- maybe others, like weights for resolving paragraph contra page breaking

This will be a new thing so I hope that there is no compatibility burden.

* hyphenation

It will be a lot of additional work, but I think that I should handle it
locally. There are two reasons:

(1) the protocol for failing and getting the list with new discretionaries 
(TeX's 2nd pass) for every individual paragraph would be extremely
complicated, in the end it might be more difficult than handling it locally.

(2) TeX's hyphenation mechanism is IMHO one of the crappiest parts of TeX.
I mean the way how the (non)ligatures are screwed up for discretionaries
which are not used in the end. So if it is handled locally, it will be IMO
simpler and more correct. There are also some research results concerning
hyphenation, which are not implemented in TeX, because it would be too
complicated.

At the first stage, I'll omit the hyphenation completelly.

At the moment, I don't remember anything else. I'm looking forward for your
feedback.

--ksk

[Dev-luatex] plugin for external formatting

Karel Skoupý