[Dev-luatex] plugin for external formatting

Hans Hagen pragma at wxs.nl
Wed Sep 21 10:52:08 CEST 2005

Karel Skoup wrote:

>>  \hsize=2in \the\list0 \par % typeset the node list
>So \the\list0 will expand to tokens (consistent with \write), right? 
>It won't just insert the list on the currently active list (would be
>inconsistent with \write), right?

btw, we have the same situation with lua: 
\lua{tex.print("\string\\relax")} results in just the word \relax being 
typeset so in order to get it texed we nee to fee din into \scantokens

so, i can imagine that there is something \scanlist\expandafter{\the\toks0}

>Sure, that's the multiple paragraph (stream) stuff. It will be the
>really tricky part, not so much for me, but in TeX, the whole model must
>be generalized/extended. It's not yet very clear to anybody, or is it?
>I think it's a real research topic.
indeed, stepwise refinement (start small -)

>OK, but that won't bring much, just some funny shapes.
sure, but on the other hand, it can be used to 'replace' the current par 
builder by a more advanced (e.g. hyphenation) one, imagine that we have:

  {write list to file (or pipe)
   call plugin in one-paragraph mode
   read list from file (or pipe)}
that way we can replace the current par builder, because by default it's 
something equivalent to:


i wonder how hard this is to implement, you and taco should know -)

>>- more boundary conditions
>>- possible page crossing
>Not only page crossing, but also column/shape/container crossing ...
>The problem is that we are used to \parshape, which just specifies
>something for certain lines in the current paragraph. But if we want to
>introduce real page layouts, then the shapes are not relative to the
>paragraphs any more. It will be a matter of formatting where a
>particular paragraph starts in the layout.
it's a combination:

- a main gutter shape (can be colums or whatever)
- shapes bound to places on the gutter
- shapes bound to specific places in the stream
- shapes that may float (within boundary condition)

>Sure, that would be great. Then I won't have to access metric files at
>all. But should I wait for that? I wanted to start with the \showlists
>output for prototyping. Well, I'll see how fast will I progress. Maybe,
>that you'll be faster :-).
ok, i know you don't like messing around with the tex source, but i can 
imagine that this showlist stuff is doable, so if you want, you can 
provide patches to the web source; we're working with a branch of pdftex 

>But concerning the metric files, if I want to treat hyphenation locally,
>then I also need the kerning and ligature programs. In TeX it is done
>too early (and then it is taken apart and (wrongly) reconstructed during
>hyphenation pass). I want to do ligatures and kernings on demand,
>basically after hyphenation (it's not that simple, but anyway).
how about a font daemon, that one could cache/access font files; we need 
to go open type anyway so maybe such a deamon can be built on top of 
existing (non tex) libraries (port 31415)

>NO. It screws up everything, not only taken or potential breaks, but
>even the potential hyphenation points which are never considered a
>break. It is also known too late, in the middle of the (atomic)
>paragraph breaking process.
ok, so that's a dead end

>>hyphenation points, unless we let tex do a pre break run with a zero 
>>hsize so that we get 'm all
>No, no, it's much more stupid than you think. TeX first builds the
>horizontal list with all kernings and ligatures, taking {} (in
>dif{}ferent) into account. Then it tries the first breaking pass with the
>\pretolerance. If that fails, then it takes the whole list, tries to
>hyphenate *all* words in the lists, inserts the explicit
>\discretionaries to *every* potential hyphen and reconstructs the
>kernings and ligatures for the segments between the \discretionaries,
>loosing all ligature preventions and yielding potentially incorrect
>ligatures and kernings for words which are actually not hyphenated.
>Then it tries the second (and maybe third pass), but it looses the
>originally built list forever. The whole breaking is an atomic operation
>(happening at \par), you can't do anything between the passes.
>Taco, is that correct, or am I too TeX unfriendly?

that's indeed too hard-coded for our purpose, so, next to a font daemon, 
we need a hyphenation daemon

>Maybe we should make a whole new glossary, for example 'node' is quite
>OK for everything in the list (char, box, glue, penalty, ...), but 'list'
>is so ambiguous, there should be something more specific (maybe 'node
>list'). TeX itself doesn't give clear names (classes) for those objects.
>I had to make them names in NTS (to name the classes), maybe we can look
>into it.
good idea; we indeed need to define proper names and descriptions; can 
you make a proposal for that based on your nts experiences?

>>well, it works for english, which was the objective; DEK would 
>>rightfully react with: then why did nobody adapt it, replace it, etc -)
>High time, huh?
>It works for English (does it really always ?), because it is simple,
>right? I don't know, whether it is a real problem in any other language
>in practice. I just know the code and I think that it is incorrect,
>inconsistent and illogical.
my impression is that tehnumber of missed/wrong cases for english is so 
small that it falls within the 'no problem to correct it manually' 
criteria; languages with compound words, accented characters etc hav 
ehigher demands


                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl

More information about the Dev-luatex mailing list