# [Dev-luatex] plugin for external formatting

Thanh Han The hanthethanh at gmail.com
Wed Sep 21 15:25:50 CEST 2005

```Hi,

this is a long thread and too many things are discussed at
the same time so I will need some time to read and
understand what is going on.

My first thought is that some small modifications to
\showlist and \showbox will help a lot. It's easy to write
additional info like dimensions of each item in the list, or
in case of characters the filename of a tfm with fontsize
(or we may write the dimensions of each char as Hans
suggested, but this is an overkill IMHO).

My feeling is that we need to work out the specification and
format of the `` node list'' first. In the first step, I
would prefer to have only node-specific things, eg only what
comes out after a box construction. I also got a similar
request: to provide a primitive that writes out the content
of a box and another primitive to re-construct that box back
extensions later on.

At the moment I cannot see clearly what is needed, but I am
willing to write some extensions so that we can experiment with
to see what is really needed and perhaps change what have been done.

Thanh

On Tue, Sep 20, 2005 at 11:58:08AM +0200, Karel Skoupy wrote:
> Hi all,
>
> as Hans has already mentioned, my concern with luatex is to have some
> interface/protocol for formatting the TeX stuff externally.
>
> Without going to details now, I'm interested in alternative algorithms for
> formatting not only paragraphs, but the whole stream. For TUG 2005 I have
> written a prototype which doesn't use any TeX code at all (it just
> parasities on ADvi code for getting some metric information and showing the
> results). For long time I planned to make a whole new system from scratch,
> but for several reasons, that was reconsidered and Hans proposed a way
> (plugin mechanism for external engine), how to cooperate with TeX, so TeX
> could benefit from the new algorithm and I can concentrate on the core
> stuff.
>
> So basically I need a stream of (character) boxes, glues, penalties, ...
> (is there a simple unambiguous notion for all that?) in a preprocessed
> form (I don't care about input and macro handling) plus some parameters
> (standard paragraph breaking parameters and the new special ones) and I
> will return a stream of fixed boxes.
>
> 'I' will often mean 'the engine' depending on the context :-)
>
> In the first stage, I won't need lua (or any changes to TeX) at all. I
> plan to use \showlists for my input stream and to generate a standard
> TeX input file for reading the result back. Of course, it won't be so
> simple, there will be some macro programming and trickery, which will
> make the whole thing complicated, fragile, unreliable, and inefficient for
> real use. Therefore some hooks from the actively developed TeX will be
> probably useful for making the cooperation of TeX and the external
> engine smooth. It might use lua or not, we will see, in any case I would
> like to keep the plugin support generic and (complete but) minimal.
>
> I will now list the aspects of the communications between TeX and the
> engine which I have thought of so far. I will be glad if you can just think
> about it for the moment and give me some feedback if you will.
>
> * single paragraph stuff
>
> I need:
> (1) complete representation of all the stuff which is to be returned formatted
> (2) sizes of all the objects which are involved in formatting
> (3) properties which influence the formatting (breakable, discardable, ...)
>
> It seems that the standard output of \showlists (or \showbox) will mostly do.
> (1) is fulfilled I guess (the returned input needs to be only slightly
> modified to fit TeX).
>
> (2) is little bit tricky, because for the characters I get only an id of
> the font. So I will need to know the exact reference to a real font to get
> the metrics information. This can be learned by eg. \show\tenrm. But of
> course it is not know in advance what fonts are used in the paragraph, so
> either all fonts can be listed at the beginning -- but where to get the list of
> all font definitions, and the definitions can actually change in the middle
> of the paragraph -- or I can make a first pass, collect the font ids and
> ask for them in the second pass. It will be bit tricky and won't be
> reliable due to redefinitions (I can also change the current id using \let
> and lose the old id (still used in the log), right?), so it will be OK for
> experimenting but for a real version, I will need a better support from
> TeX.
>
> (3) is implicit, right?
>
> * stream of paragraphs
>
> I can need even the whole chapter, because I want to treat
> - shapes and layouts, which are relative to page and not to a particular
>   paragraph
> - pagination, floats placement
>
> For the basic experimenting I can redefine \par to something like
> \hfil\break\indent but it will restrict all kinds of things which can
> happen between the paragraphs (in vertical mode). Of course, the whole
> thing will never be compatible to TeX, because TeX expect after \par that
> the last paragraph was formatted and placed on the vertical list. So it
> will be responsibility of the user/macro-programmer to bear the
> consequences of using the alternative mechanism. Nevertheless, the
> consequences should be as small as possible.
>
> So for the prototyping I can redefine \par or perhaps I can store the whole
> paragraphs in infinite hboxes (redefining \hsize?) or maybe I can use some
> \specials for tagging, but for the production version, this will be a very
> tricky part. Not so much for the engine, but mainly on the TeX side. It
> should be of a great concern for people who would want to use the new
> algorithms in their systems (Hans?), (after those ideas are first tested by
> a prototype :-).
>
> * passing the parameters specific to the new algorithms
>
> - layouts, shapes
> - maybe others, like weights for resolving paragraph contra page breaking
>
> This will be a new thing so I hope that there is no compatibility burden.
>
> * hyphenation
>
> It will be a lot of additional work, but I think that I should handle it
> locally. There are two reasons:
>
> (1) the protocol for failing and getting the list with new discretionaries
> (TeX's 2nd pass) for every individual paragraph would be extremely
> complicated, in the end it might be more difficult than handling it locally.
>
> (2) TeX's hyphenation mechanism is IMHO one of the crappiest parts of TeX.
> I mean the way how the (non)ligatures are screwed up for discretionaries
> which are not used in the end. So if it is handled locally, it will be IMO
> simpler and more correct. There are also some research results concerning
> hyphenation, which are not implemented in TeX, because it would be too
> complicated.
>
> At the first stage, I'll omit the hyphenation completelly.
>
> At the moment, I don't remember anything else. I'm looking forward for your
> feedback.
>
> --ksk
> _______________________________________________
> Dev-luatex mailing list
> Dev-luatex at ntg.nl
> http://www.ntg.nl/mailman/listinfo/dev-luatex
>
```