[Dev-luatex] plugin for external formatting

Hans Hagen pragma at wxs.nl
Tue Sep 20 16:03:08 CEST 2005

Karel Skoup wrote:

>* single paragraph stuff
>I need:
>(1) complete representation of all the stuff which is to be returned formatted
>(2) sizes of all the objects which are involved in formatting
>(3) properties which influence the formatting (breakable, discardable, ...)
if this paragraph crosses a page, you may need to know the available 
room as well, so things like pagegoal and pagetotal also need to be 
communicated (maybe also left/right page state if the shape is page 
dependent i.e. inner or outer margin bound)

>It seems that the standard output of \showlists (or \showbox) will mostly do.
>(1) is fulfilled I guess (the returned input needs to be only slightly
>modified to fit TeX).
it will probably do for the first prototype; i can imagine that you 
implement several strategies,

- simple paragraph, on page
- more boundary conditions
- possible page crossing


>(2) is little bit tricky, because for the characters I get only an id of
>the font. So I will need to know the exact reference to a real font to get
>the metrics information. This can be learned by eg. \show\tenrm. But of
>course it is not know in advance what fonts are used in the paragraph, so
>either all fonts can be listed at the beginning -- but where to get the list of
>all font definitions, and the definitions can actually change in the middle
>of the paragraph -- or I can make a first pass, collect the font ids and
>ask for them in the second pass. It will be bit tricky and won't be
>reliable due to redefinitions (I can also change the current id using \let
>and lose the old id (still used in the log), right?), so it will be OK for
>experimenting but for a real version, I will need a better support from
Why not resolve that info on forehand? Since the order does not change, 
I can imagine passing chars as some kind of special charbox (wd,dp.ht + 
ref) and when reading back, the ref can be used to insert the char node 
again; we don't need to save bytes -)

>(3) is implicit, right?
is hyphenation known at that time (if i got it right, tex only looks at 
places where breakpoint smake sense, so you don't get all possible 
hyphenation points, unless we let tex do a pre break run with a zero 
hsize so that we get 'm all

>* stream of paragraphs
>I can need even the whole chapter, because I want to treat
>- shapes and layouts, which are relative to page and not to a particular
>  paragraph
>- pagination, floats placement
let's talk of chunks instead of chapters and moving objects instead of 
floats -)

this is not easy, so that will be a stepwise refinement of the specs; 
something like a sequence of master shapes (normally rectangular text 
areas), frozen forbidden areas (anchored on pages) and movable 
forbidden/reserved areas (afterwards they may get content overlayed); we 
need some kind of 'special' mechanism where certain places in the 
constructed list can get postprocessed/things attached etc

>For the basic experimenting I can redefine \par to something like
>\hfil\break\indent but it will restrict all kinds of things which can
>happen between the paragraphs (in vertical mode). Of course, the whole
>thing will never be compatible to TeX, because TeX expect after \par that
>the last paragraph was formatted and placed on the vertical list. So it
>will be responsibility of the user/macro-programmer to bear the
>consequences of using the alternative mechanism. Nevertheless, the
>consequences should be as small as possible.
redefining par will mess up a lot of things

>So for the prototyping I can redefine \par or perhaps I can store the whole
>paragraphs in infinite hboxes (redefining \hsize?) or maybe I can use some
>\specials for tagging, but for the production version, this will be a very
>tricky part. Not so much for the engine, but mainly on the TeX side. It
>should be of a great concern for people who would want to use the new
>algorithms in their systems (Hans?), (after those ideas are first tested by
>a prototype :-).
see taco's mail, we should built a list writer

>* passing the parameters specific to the new algorithms
>- layouts, shapes
>- maybe others, like weights for resolving paragraph contra page breaking
>This will be a new thing so I hope that there is no compatibility burden.
>* hyphenation
indeed in the end this is needed

>It will be a lot of additional work, but I think that I should handle it
>locally. There are two reasons:
>(1) the protocol for failing and getting the list with new discretionaries 
>(TeX's 2nd pass) for every individual paragraph would be extremely
>complicated, in the end it might be more difficult than handling it locally.
indeed messing around with tex's list is painful (reconstructing, 
ligature mess, etc); i can even imagine that you implement it in such a 
way that we can use it as alternative for the existing one (basically 
the simple paragraph variant)

>(2) TeX's hyphenation mechanism is IMHO one of the crappiest parts of TeX.
>I mean the way how the (non)ligatures are screwed up for discretionaries
>which are not used in the end. So if it is handled locally, it will be IMO
well, it works for english, which was the objective; DEK would 
rightfully react with: then why did nobody adapt it, replace it, etc -)

>simpler and more correct. There are also some research results concerning
>hyphenation, which are not implemented in TeX, because it would be too
right, we need to add things like compound word hyphenation, dictionary 
support (in order to handle words that don't need the ligatures, etc)

>At the first stage, I'll omit the hyphenation completelly.
or maybe some poor mans alternative: let tex give the list with all 
points and remove them when needed (ok, we lost kerning in the process) 
but it may look better that no hyphenation at all -)


                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl

More information about the Dev-luatex mailing list