# [Dev-luatex] plugin for external formatting

Hans Hagen pragma at wxs.nl
Tue Sep 20 16:03:08 CEST 2005

```Karel Skoup wrote:

>* single paragraph stuff
>
>I need:
>(1) complete representation of all the stuff which is to be returned formatted
>(2) sizes of all the objects which are involved in formatting
>(3) properties which influence the formatting (breakable, discardable, ...)
>
>
if this paragraph crosses a page, you may need to know the available
room as well, so things like pagegoal and pagetotal also need to be
communicated (maybe also left/right page state if the shape is page
dependent i.e. inner or outer margin bound)

>It seems that the standard output of \showlists (or \showbox) will mostly do.
>(1) is fulfilled I guess (the returned input needs to be only slightly
>modified to fit TeX).
>
>
it will probably do for the first prototype; i can imagine that you
implement several strategies,

- simple paragraph, on page
- more boundary conditions
- possible page crossing

etc

>(2) is little bit tricky, because for the characters I get only an id of
>the font. So I will need to know the exact reference to a real font to get
>the metrics information. This can be learned by eg. \show\tenrm. But of
>course it is not know in advance what fonts are used in the paragraph, so
>either all fonts can be listed at the beginning -- but where to get the list of
>all font definitions, and the definitions can actually change in the middle
>of the paragraph -- or I can make a first pass, collect the font ids and
>ask for them in the second pass. It will be bit tricky and won't be
>reliable due to redefinitions (I can also change the current id using \let
>and lose the old id (still used in the log), right?), so it will be OK for
>experimenting but for a real version, I will need a better support from
>TeX.
>
>
Why not resolve that info on forehand? Since the order does not change,
I can imagine passing chars as some kind of special charbox (wd,dp.ht +
ref) and when reading back, the ref can be used to insert the char node
again; we don't need to save bytes -)

>(3) is implicit, right?
>
>
>
is hyphenation known at that time (if i got it right, tex only looks at
places where breakpoint smake sense, so you don't get all possible
hyphenation points, unless we let tex do a pre break run with a zero
hsize so that we get 'm all

>* stream of paragraphs
>
>I can need even the whole chapter, because I want to treat
>- shapes and layouts, which are relative to page and not to a particular
>  paragraph
>- pagination, floats placement
>
>
let's talk of chunks instead of chapters and moving objects instead of
floats -)

this is not easy, so that will be a stepwise refinement of the specs;
something like a sequence of master shapes (normally rectangular text
areas), frozen forbidden areas (anchored on pages) and movable
forbidden/reserved areas (afterwards they may get content overlayed); we
need some kind of 'special' mechanism where certain places in the
constructed list can get postprocessed/things attached etc

>For the basic experimenting I can redefine \par to something like
>\hfil\break\indent but it will restrict all kinds of things which can
>happen between the paragraphs (in vertical mode). Of course, the whole
>thing will never be compatible to TeX, because TeX expect after \par that
>the last paragraph was formatted and placed on the vertical list. So it
>will be responsibility of the user/macro-programmer to bear the
>consequences of using the alternative mechanism. Nevertheless, the
>consequences should be as small as possible.
>
>
redefining par will mess up a lot of things

>So for the prototyping I can redefine \par or perhaps I can store the whole
>paragraphs in infinite hboxes (redefining \hsize?) or maybe I can use some
>\specials for tagging, but for the production version, this will be a very
>tricky part. Not so much for the engine, but mainly on the TeX side. It
>should be of a great concern for people who would want to use the new
>algorithms in their systems (Hans?), (after those ideas are first tested by
>a prototype :-).
>
>
see taco's mail, we should built a list writer

>* passing the parameters specific to the new algorithms
>
>- layouts, shapes
>- maybe others, like weights for resolving paragraph contra page breaking
>
>This will be a new thing so I hope that there is no compatibility burden.
>
>* hyphenation
>
>
indeed in the end this is needed

>It will be a lot of additional work, but I think that I should handle it
>locally. There are two reasons:
>
>(1) the protocol for failing and getting the list with new discretionaries
>(TeX's 2nd pass) for every individual paragraph would be extremely
>complicated, in the end it might be more difficult than handling it locally.
>
>
indeed messing around with tex's list is painful (reconstructing,
ligature mess, etc); i can even imagine that you implement it in such a
way that we can use it as alternative for the existing one (basically
the simple paragraph variant)

>(2) TeX's hyphenation mechanism is IMHO one of the crappiest parts of TeX.
>I mean the way how the (non)ligatures are screwed up for discretionaries
>which are not used in the end. So if it is handled locally, it will be IMO
>
>
well, it works for english, which was the objective; DEK would
rightfully react with: then why did nobody adapt it, replace it, etc -)

>simpler and more correct. There are also some research results concerning
>hyphenation, which are not implemented in TeX, because it would be too
>complicated.
>
>
>
right, we need to add things like compound word hyphenation, dictionary
support (in order to handle words that don't need the ligatures, etc)

>At the first stage, I'll omit the hyphenation completelly.
>
>
or maybe some poor mans alternative: let tex give the list with all
points and remove them when needed (ok, we lost kerning in the process)
but it may look better that no hyphenation at all -)

Hans

-----------------------------------------------------------------