[Dev-luatex] plugin for external formatting

Karel Skoupý skoupy at inf.ethz.ch
Tue Sep 20 22:59:52 CEST 2005

On Tue, 20. Sep 2005, 13.01.23 13:01:23, Taco Hoekwater wrote:
> >At the moment, I don't remember anything else. I'm looking forward for your
> >feedback.
> It will take some time before I fully understand your post, but I want
> to bring up something that Hans and I have talked about recently, namely
> the addition of 'nodelist registers' analogous to \toks registers.

That sounds all great.

> The idea was to have registers and read/write syntax to allow things
> like this (rough ideas, api may change yet):
>   \list0=\unhbox0            % now \list0 contains a node list
>   \list1={Hello world!}      % now \list1 also contains a node list
>   \list2={\hsize=12in }      % error: only node-building allowed
>   \write16{\the\list0 }      % like \showlists, but using a fully
>                              % restorable read syntax

Be careful about the redefinitions of the fonts in the middle of the
list. It can be either forbidden or reproduced  in the read.

For me the fully restorable read syntax is very important (can I get all
the information from standard \showlists now?). I also need all the
context information (font definitions, some parameters). It can be
passed as an extra chunk, if we figure out some protocol. But maybe it
can be also all inlined (see the char dimensions bellow), then it won't
be a restorable read syntax (there will be too much), but maybe we can
have \export16\list0 which dumps really everything. I don't mind filtering
out the extras when preparing my output (TeX's input).

>   \hsize=2in \the\list0 \par % typeset the node list

So \the\list0 will expand to tokens (consistent with \write), right? 
It won't just insert the list on the currently active list (would be
inconsistent with \write), right?

>   \noindent hello world!
>   \list0=\lastlist           % gobbles the node list from this par
>                              % before anything is done to it.

On Tue, 20. Sep 2005, 16.03.08 16:03:08, Hans Hagen wrote:
> >* single paragraph stuff
> >
> >I need:
> >(1) complete representation of all the stuff which is to be returned 
> >formatted
> >(2) sizes of all the objects which are involved in formatting
> >(3) properties which influence the formatting (breakable, discardable, ...)
> > 
> >
> if this paragraph crosses a page, you may need to know the available 
> room as well, so things like pagegoal and pagetotal also need to be 
> communicated (maybe also left/right page state if the shape is page 
> dependent i.e. inner or outer margin bound)

Sure, that's the multiple paragraph (stream) stuff. It will be the
really tricky part, not so much for me, but in TeX, the whole model must
be generalized/extended. It's not yet very clear to anybody, or is it?
I think it's a real research topic.

> >It seems that the standard output of \showlists (or \showbox) will mostly 
> >do.
> >(1) is fulfilled I guess (the returned input needs to be only slightly
> >modified to fit TeX).
> > 
> >
> it will probably do for the first prototype; i can imagine that you 
> implement several strategies,
> - simple paragraph, on page

OK, but that won't bring much, just some funny shapes.

> - more boundary conditions
> - possible page crossing

Not only page crossing, but also column/shape/container crossing ...
The problem is that we are used to \parshape, which just specifies
something for certain lines in the current paragraph. But if we want to
introduce real page layouts, then the shapes are not relative to the
paragraphs any more. It will be a matter of formatting where a
particular paragraph starts in the layout.

> >(2) is little bit tricky, because for the characters I get only an id of
> >the font. So I will need to know the exact reference to a real font to get
> >the metrics information. This can be learned by eg. \show\tenrm. But of
> >course it is not know in advance what fonts are used in the paragraph, so
> >either all fonts can be listed at the beginning -- but where to get the 
> >list of
> >all font definitions, and the definitions can actually change in the middle
> >of the paragraph -- or I can make a first pass, collect the font ids and
> >ask for them in the second pass. It will be bit tricky and won't be
> >reliable due to redefinitions (I can also change the current id using \let
> >and lose the old id (still used in the log), right?), so it will be OK for
> >experimenting but for a real version, I will need a better support from
> >TeX.
> > 
> >
> Why not resolve that info on forehand? Since the order does not change, 
> I can imagine passing chars as some kind of special charbox (wd,dp.ht + 
> ref) and when reading back, the ref can be used to insert the char node 
> again; we don't need to save bytes -)

Sure, that would be great. Then I won't have to access metric files at
all. But should I wait for that? I wanted to start with the \showlists
output for prototyping. Well, I'll see how fast will I progress. Maybe,
that you'll be faster :-).

But concerning the metric files, if I want to treat hyphenation locally,
then I also need the kerning and ligature programs. In TeX it is done
too early (and then it is taken apart and (wrongly) reconstructed during
hyphenation pass). I want to do ligatures and kernings on demand,
basically after hyphenation (it's not that simple, but anyway).

> >(3) is implicit, right?
> >
> > 
> >
> is hyphenation known at that time (if i got it right, tex only looks at 
> places where breakpoint smake sense, so you don't get all possible 

NO. It screws up everything, not only taken or potential breaks, but
even the potential hyphenation points which are never considered a
break. It is also known too late, in the middle of the (atomic)
paragraph breaking process.

> hyphenation points, unless we let tex do a pre break run with a zero 
> hsize so that we get 'm all

No, no, it's much more stupid than you think. TeX first builds the
horizontal list with all kernings and ligatures, taking {} (in
dif{}ferent) into account. Then it tries the first breaking pass with the
\pretolerance. If that fails, then it takes the whole list, tries to
hyphenate *all* words in the lists, inserts the explicit
\discretionaries to *every* potential hyphen and reconstructs the
kernings and ligatures for the segments between the \discretionaries,
loosing all ligature preventions and yielding potentially incorrect
ligatures and kernings for words which are actually not hyphenated.
Then it tries the second (and maybe third pass), but it looses the
originally built list forever. The whole breaking is an atomic operation
(happening at \par), you can't do anything between the passes.

Taco, is that correct, or am I too TeX unfriendly?

> >* stream of paragraphs
> >
> >I can need even the whole chapter, because I want to treat
> >- shapes and layouts, which are relative to page and not to a particular
> > paragraph
> >- pagination, floats placement
> > 
> >
> let's talk of chunks instead of chapters and moving objects instead of 
> floats -)

Maybe we should make a whole new glossary, for example 'node' is quite
OK for everything in the list (char, box, glue, penalty, ...), but 'list'
is so ambiguous, there should be something more specific (maybe 'node
list'). TeX itself doesn't give clear names (classes) for those objects.
I had to make them names in NTS (to name the classes), maybe we can look
into it.

> this is not easy, so that will be a stepwise refinement of the specs; 

Exactly, we have touched it above.

> something like a sequence of master shapes (normally rectangular text 
> areas), frozen forbidden areas (anchored on pages) and movable 
> forbidden/reserved areas (afterwards they may get content overlayed); we 
> need some kind of 'special' mechanism where certain places in the 
> constructed list can get postprocessed/things attached etc


> >For the basic experimenting I can redefine \par to something like
> >\hfil\break\indent but it will restrict all kinds of things which can
> >happen between the paragraphs (in vertical mode). Of course, the whole
> >thing will never be compatible to TeX, because TeX expect after \par that
> >the last paragraph was formatted and placed on the vertical list. So it
> >will be responsibility of the user/macro-programmer to bear the
> >consequences of using the alternative mechanism. Nevertheless, the
> >consequences should be as small as possible.
> > 
> >
> redefining par will mess up a lot of things

Sure, but I need something now for the prototyping and then some robust
support from luatex for the production.

> >* hyphenation
> > 
> >
> indeed in the end this is needed
> >It will be a lot of additional work, but I think that I should handle it
> >locally. There are two reasons:
> >
> >(1) the protocol for failing and getting the list with new discretionaries 
> >(TeX's 2nd pass) for every individual paragraph would be extremely
> >complicated, in the end it might be more difficult than handling it 
> >locally.
> > 
> >
> indeed messing around with tex's list is painful (reconstructing, 
> ligature mess, etc); i can even imagine that you implement it in such a 
> way that we can use it as alternative for the existing one (basically 
> the simple paragraph variant)

Yes, that's my intention.

> >(2) TeX's hyphenation mechanism is IMHO one of the crappiest parts of TeX.
> >I mean the way how the (non)ligatures are screwed up for discretionaries
> >which are not used in the end. So if it is handled locally, it will be IMO
> > 
> well, it works for english, which was the objective; DEK would 
> rightfully react with: then why did nobody adapt it, replace it, etc -)

High time, huh?

It works for English (does it really always ?), because it is simple,
right? I don't know, whether it is a real problem in any other language
in practice. I just know the code and I think that it is incorrect,
inconsistent and illogical.

> >simpler and more correct. There are also some research results concerning
> >hyphenation, which are not implemented in TeX, because it would be too
> >complicated.
> >
> right, we need to add things like compound word hyphenation, dictionary 
> support (in order to handle words that don't need the ligatures, etc)


> >At the first stage, I'll omit the hyphenation completelly.
> > 
> or maybe some poor mans alternative: let tex give the list with all 
> points and remove them when needed (ok, we lost kerning in the process) 
> but it may look better that no hyphenation at all -)

Might be still more work that doing the right thing.

On Tue, 20. Sep 2005, 18.39.57 18:39:57, Taco Hoekwater wrote:
> Hans Hagen wrote:
> >
> >How complex is it to implement this?
> It should not be very complicated (not deadsimple either), but
> until today this was not even in the top-ten of my todo list ;-)

TeX works with the node lists all the time, it shouldn't be so difficult
to make another kind of register to keep them. Well, everything can turn
tricky inside TeX code. In NTS, it would be trivial ;-).


More information about the Dev-luatex mailing list