Re: [Dev-luatex] plugin for external formatting

21 Sep 2005

      Karel Skoup wrote:
...
...
\hsize=2in \the\list0 \par % typeset the node list
So \the\list0 will expand to tokens (consistent with \write), right? 
It won't just insert the list on the currently active list (would be
inconsistent with \write), right?
indeed

btw, we have the same situation with lua: 
\lua{tex.print("\string\\relax")} results in just the word \relax being 
typeset so in order to get it texed we nee to fee din into \scantokens

so, i can imagine that there is something \scanlist\expandafter{\the\toks0}
...
Sure, that's the multiple paragraph (stream) stuff. It will be the
really tricky part, not so much for me, but in TeX, the whole model must
be generalized/extended. It's not yet very clear to anybody, or is it?
I think it's a real research topic.
indeed, stepwise refinement (start small -)
...
OK, but that won't bring much, just some funny shapes.
sure, but on the other hand, it can be used to 'replace' the current par 
builder by a more advanced (e.g. hyphenation) one, imagine that we have:

\paroutput
  {write list to file (or pipe)
   call plugin in one-paragraph mode
   read list from file (or pipe)}

that way we can replace the current par builder, because by default it's 
something equivalent to:

\paroutput{\scanlist\expandafter{\the\list255}}

i wonder how hard this is to implement, you and taco should know -)
...
...
- more boundary conditions
- possible page crossing
Not only page crossing, but also column/shape/container crossing ...
The problem is that we are used to \parshape, which just specifies
something for certain lines in the current paragraph. But if we want to
introduce real page layouts, then the shapes are not relative to the
paragraphs any more. It will be a matter of formatting where a
particular paragraph starts in the layout.
it's a combination:

- a main gutter shape (can be colums or whatever)
- shapes bound to places on the gutter
- shapes bound to specific places in the stream
- shapes that may float (within boundary condition)
...
Sure, that would be great. Then I won't have to access metric files at
all. But should I wait for that? I wanted to start with the \showlists
output for prototyping. Well, I'll see how fast will I progress. Maybe,
that you'll be faster :-).
ok, i know you don't like messing around with the tex source, but i can 
imagine that this showlist stuff is doable, so if you want, you can 
provide patches to the web source; we're working with a branch of pdftex 
anyway;
...
But concerning the metric files, if I want to treat hyphenation locally,
then I also need the kerning and ligature programs. In TeX it is done
too early (and then it is taken apart and (wrongly) reconstructed during
hyphenation pass). I want to do ligatures and kernings on demand,
basically after hyphenation (it's not that simple, but anyway).
how about a font daemon, that one could cache/access font files; we need 
to go open type anyway so maybe such a deamon can be built on top of 
existing (non tex) libraries (port 31415)
...
NO. It screws up everything, not only taken or potential breaks, but
even the potential hyphenation points which are never considered a
break. It is also known too late, in the middle of the (atomic)
paragraph breaking process.
ok, so that's a dead end
...
...
hyphenation points, unless we let tex do a pre break run with a zero 
hsize so that we get 'm all
No, no, it's much more stupid than you think. TeX first builds the
horizontal list with all kernings and ligatures, taking {} (in
dif{}ferent) into account. Then it tries the first breaking pass with the
\pretolerance. If that fails, then it takes the whole list, tries to
hyphenate *all* words in the lists, inserts the explicit
\discretionaries to *every* potential hyphen and reconstructs the
kernings and ligatures for the segments between the \discretionaries,
loosing all ligature preventions and yielding potentially incorrect
ligatures and kernings for words which are actually not hyphenated.
Then it tries the second (and maybe third pass), but it looses the
originally built list forever. The whole breaking is an atomic operation
(happening at \par), you can't do anything between the passes.
Taco, is that correct, or am I too TeX unfriendly?
-)

that's indeed too hard-coded for our purpose, so, next to a font daemon, 
we need a hyphenation daemon
...
Maybe we should make a whole new glossary, for example 'node' is quite
OK for everything in the list (char, box, glue, penalty, ...), but 'list'
is so ambiguous, there should be something more specific (maybe 'node
list'). TeX itself doesn't give clear names (classes) for those objects.
I had to make them names in NTS (to name the classes), maybe we can look
into it.
good idea; we indeed need to define proper names and descriptions; can 
you make a proposal for that based on your nts experiences?
...
...
well, it works for english, which was the objective; DEK would 
rightfully react with: then why did nobody adapt it, replace it, etc -)
High time, huh?
It works for English (does it really always ?), because it is simple,
right? I don't know, whether it is a real problem in any other language
in practice. I just know the code and I think that it is incorrect,
inconsistent and illogical.
my impression is that tehnumber of missed/wrong cases for english is so 
small that it falls within the 'no problem to correct it manually' 
criteria; languages with compound words, accented characters etc hav 
ehigher demands

Hans

-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------