Hi all, as Hans has already mentioned, my concern with luatex is to have some interface/protocol for formatting the TeX stuff externally. Without going to details now, I'm interested in alternative algorithms for formatting not only paragraphs, but the whole stream. For TUG 2005 I have written a prototype which doesn't use any TeX code at all (it just parasities on ADvi code for getting some metric information and showing the results). For long time I planned to make a whole new system from scratch, but for several reasons, that was reconsidered and Hans proposed a way (plugin mechanism for external engine), how to cooperate with TeX, so TeX could benefit from the new algorithm and I can concentrate on the core stuff. So basically I need a stream of (character) boxes, glues, penalties, ... (is there a simple unambiguous notion for all that?) in a preprocessed form (I don't care about input and macro handling) plus some parameters (standard paragraph breaking parameters and the new special ones) and I will return a stream of fixed boxes. 'I' will often mean 'the engine' depending on the context :-) In the first stage, I won't need lua (or any changes to TeX) at all. I plan to use \showlists for my input stream and to generate a standard TeX input file for reading the result back. Of course, it won't be so simple, there will be some macro programming and trickery, which will make the whole thing complicated, fragile, unreliable, and inefficient for real use. Therefore some hooks from the actively developed TeX will be probably useful for making the cooperation of TeX and the external engine smooth. It might use lua or not, we will see, in any case I would like to keep the plugin support generic and (complete but) minimal. I will now list the aspects of the communications between TeX and the engine which I have thought of so far. I will be glad if you can just think about it for the moment and give me some feedback if you will. * single paragraph stuff I need: (1) complete representation of all the stuff which is to be returned formatted (2) sizes of all the objects which are involved in formatting (3) properties which influence the formatting (breakable, discardable, ...) It seems that the standard output of \showlists (or \showbox) will mostly do. (1) is fulfilled I guess (the returned input needs to be only slightly modified to fit TeX). (2) is little bit tricky, because for the characters I get only an id of the font. So I will need to know the exact reference to a real font to get the metrics information. This can be learned by eg. \show\tenrm. But of course it is not know in advance what fonts are used in the paragraph, so either all fonts can be listed at the beginning -- but where to get the list of all font definitions, and the definitions can actually change in the middle of the paragraph -- or I can make a first pass, collect the font ids and ask for them in the second pass. It will be bit tricky and won't be reliable due to redefinitions (I can also change the current id using \let and lose the old id (still used in the log), right?), so it will be OK for experimenting but for a real version, I will need a better support from TeX. (3) is implicit, right? * stream of paragraphs I can need even the whole chapter, because I want to treat - shapes and layouts, which are relative to page and not to a particular paragraph - pagination, floats placement For the basic experimenting I can redefine \par to something like \hfil\break\indent but it will restrict all kinds of things which can happen between the paragraphs (in vertical mode). Of course, the whole thing will never be compatible to TeX, because TeX expect after \par that the last paragraph was formatted and placed on the vertical list. So it will be responsibility of the user/macro-programmer to bear the consequences of using the alternative mechanism. Nevertheless, the consequences should be as small as possible. So for the prototyping I can redefine \par or perhaps I can store the whole paragraphs in infinite hboxes (redefining \hsize?) or maybe I can use some \specials for tagging, but for the production version, this will be a very tricky part. Not so much for the engine, but mainly on the TeX side. It should be of a great concern for people who would want to use the new algorithms in their systems (Hans?), (after those ideas are first tested by a prototype :-). * passing the parameters specific to the new algorithms - layouts, shapes - maybe others, like weights for resolving paragraph contra page breaking This will be a new thing so I hope that there is no compatibility burden. * hyphenation It will be a lot of additional work, but I think that I should handle it locally. There are two reasons: (1) the protocol for failing and getting the list with new discretionaries (TeX's 2nd pass) for every individual paragraph would be extremely complicated, in the end it might be more difficult than handling it locally. (2) TeX's hyphenation mechanism is IMHO one of the crappiest parts of TeX. I mean the way how the (non)ligatures are screwed up for discretionaries which are not used in the end. So if it is handled locally, it will be IMO simpler and more correct. There are also some research results concerning hyphenation, which are not implemented in TeX, because it would be too complicated. At the first stage, I'll omit the hyphenation completelly. At the moment, I don't remember anything else. I'm looking forward for your feedback. --ksk
Hi Karel, Karel Skoupý wrote:
Hi all,
as Hans has already mentioned, my concern with luatex is to have some interface/protocol for formatting the TeX stuff externally.
.. [snip] ...
At the moment, I don't remember anything else. I'm looking forward for your feedback.
It will take some time before I fully understand your post, but I want to bring up something that Hans and I have talked about recently, namely the addition of 'nodelist registers' analogous to \toks registers. The idea was to have registers and read/write syntax to allow things like this (rough ideas, api may change yet): \list0=\unhbox0 % now \list0 contains a node list \list1={Hello world!} % now \list1 also contains a node list \list2={\hsize=12in } % error: only node-building allowed \write16{\the\list0 } % like \showlists, but using a fully % restorable read syntax \hsize=2in \the\list0 \par % typeset the node list \noindent hello world! \list0=\lastlist % gobbles the node list from this par % before anything is done to it. Cheers, Taco
Taco Hoekwater wrote:
The idea was to have registers and read/write syntax to allow things like this (rough ideas, api may change yet):
\list0=\unhbox0 % now \list0 contains a node list \list1={Hello world!} % now \list1 also contains a node list \list2={\hsize=12in } % error: only node-building allowed
\write16{\the\list0 } % like \showlists, but using a fully % restorable read syntax
\hsize=2in \the\list0 \par % typeset the node list
\noindent hello world! \list0=\lastlist % gobbles the node list from this par % before anything is done to it.
How complex is it to implement this? Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Tue, 20. Sep 2005, 13.01.23 13:01:23, Taco Hoekwater wrote:
At the moment, I don't remember anything else. I'm looking forward for your feedback.
It will take some time before I fully understand your post, but I want to bring up something that Hans and I have talked about recently, namely the addition of 'nodelist registers' analogous to \toks registers.
That sounds all great.
The idea was to have registers and read/write syntax to allow things like this (rough ideas, api may change yet):
\list0=\unhbox0 % now \list0 contains a node list \list1={Hello world!} % now \list1 also contains a node list \list2={\hsize=12in } % error: only node-building allowed
\write16{\the\list0 } % like \showlists, but using a fully % restorable read syntax
Be careful about the redefinitions of the fonts in the middle of the list. It can be either forbidden or reproduced in the read. For me the fully restorable read syntax is very important (can I get all the information from standard \showlists now?). I also need all the context information (font definitions, some parameters). It can be passed as an extra chunk, if we figure out some protocol. But maybe it can be also all inlined (see the char dimensions bellow), then it won't be a restorable read syntax (there will be too much), but maybe we can have \export16\list0 which dumps really everything. I don't mind filtering out the extras when preparing my output (TeX's input).
\hsize=2in \the\list0 \par % typeset the node list
So \the\list0 will expand to tokens (consistent with \write), right? It won't just insert the list on the currently active list (would be inconsistent with \write), right?
\noindent hello world! \list0=\lastlist % gobbles the node list from this par % before anything is done to it.
On Tue, 20. Sep 2005, 16.03.08 16:03:08, Hans Hagen wrote:
* single paragraph stuff
I need: (1) complete representation of all the stuff which is to be returned formatted (2) sizes of all the objects which are involved in formatting (3) properties which influence the formatting (breakable, discardable, ...)
if this paragraph crosses a page, you may need to know the available room as well, so things like pagegoal and pagetotal also need to be communicated (maybe also left/right page state if the shape is page dependent i.e. inner or outer margin bound)
Sure, that's the multiple paragraph (stream) stuff. It will be the really tricky part, not so much for me, but in TeX, the whole model must be generalized/extended. It's not yet very clear to anybody, or is it? I think it's a real research topic.
It seems that the standard output of \showlists (or \showbox) will mostly do. (1) is fulfilled I guess (the returned input needs to be only slightly modified to fit TeX).
it will probably do for the first prototype; i can imagine that you implement several strategies,
- simple paragraph, on page
OK, but that won't bring much, just some funny shapes.
- more boundary conditions - possible page crossing
Not only page crossing, but also column/shape/container crossing ... The problem is that we are used to \parshape, which just specifies something for certain lines in the current paragraph. But if we want to introduce real page layouts, then the shapes are not relative to the paragraphs any more. It will be a matter of formatting where a particular paragraph starts in the layout.
(2) is little bit tricky, because for the characters I get only an id of the font. So I will need to know the exact reference to a real font to get the metrics information. This can be learned by eg. \show\tenrm. But of course it is not know in advance what fonts are used in the paragraph, so either all fonts can be listed at the beginning -- but where to get the list of all font definitions, and the definitions can actually change in the middle of the paragraph -- or I can make a first pass, collect the font ids and ask for them in the second pass. It will be bit tricky and won't be reliable due to redefinitions (I can also change the current id using \let and lose the old id (still used in the log), right?), so it will be OK for experimenting but for a real version, I will need a better support from TeX.
Why not resolve that info on forehand? Since the order does not change, I can imagine passing chars as some kind of special charbox (wd,dp.ht + ref) and when reading back, the ref can be used to insert the char node again; we don't need to save bytes -)
Sure, that would be great. Then I won't have to access metric files at all. But should I wait for that? I wanted to start with the \showlists output for prototyping. Well, I'll see how fast will I progress. Maybe, that you'll be faster :-). But concerning the metric files, if I want to treat hyphenation locally, then I also need the kerning and ligature programs. In TeX it is done too early (and then it is taken apart and (wrongly) reconstructed during hyphenation pass). I want to do ligatures and kernings on demand, basically after hyphenation (it's not that simple, but anyway).
(3) is implicit, right?
is hyphenation known at that time (if i got it right, tex only looks at places where breakpoint smake sense, so you don't get all possible
NO. It screws up everything, not only taken or potential breaks, but even the potential hyphenation points which are never considered a break. It is also known too late, in the middle of the (atomic) paragraph breaking process.
hyphenation points, unless we let tex do a pre break run with a zero hsize so that we get 'm all
No, no, it's much more stupid than you think. TeX first builds the horizontal list with all kernings and ligatures, taking {} (in dif{}ferent) into account. Then it tries the first breaking pass with the \pretolerance. If that fails, then it takes the whole list, tries to hyphenate *all* words in the lists, inserts the explicit \discretionaries to *every* potential hyphen and reconstructs the kernings and ligatures for the segments between the \discretionaries, loosing all ligature preventions and yielding potentially incorrect ligatures and kernings for words which are actually not hyphenated. Then it tries the second (and maybe third pass), but it looses the originally built list forever. The whole breaking is an atomic operation (happening at \par), you can't do anything between the passes. Taco, is that correct, or am I too TeX unfriendly?
* stream of paragraphs
I can need even the whole chapter, because I want to treat - shapes and layouts, which are relative to page and not to a particular paragraph - pagination, floats placement
let's talk of chunks instead of chapters and moving objects instead of floats -)
Maybe we should make a whole new glossary, for example 'node' is quite OK for everything in the list (char, box, glue, penalty, ...), but 'list' is so ambiguous, there should be something more specific (maybe 'node list'). TeX itself doesn't give clear names (classes) for those objects. I had to make them names in NTS (to name the classes), maybe we can look into it.
this is not easy, so that will be a stepwise refinement of the specs;
Exactly, we have touched it above.
something like a sequence of master shapes (normally rectangular text areas), frozen forbidden areas (anchored on pages) and movable forbidden/reserved areas (afterwards they may get content overlayed); we need some kind of 'special' mechanism where certain places in the constructed list can get postprocessed/things attached etc
Yes.
For the basic experimenting I can redefine \par to something like \hfil\break\indent but it will restrict all kinds of things which can happen between the paragraphs (in vertical mode). Of course, the whole thing will never be compatible to TeX, because TeX expect after \par that the last paragraph was formatted and placed on the vertical list. So it will be responsibility of the user/macro-programmer to bear the consequences of using the alternative mechanism. Nevertheless, the consequences should be as small as possible.
redefining par will mess up a lot of things
Sure, but I need something now for the prototyping and then some robust support from luatex for the production.
* hyphenation
indeed in the end this is needed
It will be a lot of additional work, but I think that I should handle it locally. There are two reasons:
(1) the protocol for failing and getting the list with new discretionaries (TeX's 2nd pass) for every individual paragraph would be extremely complicated, in the end it might be more difficult than handling it locally.
indeed messing around with tex's list is painful (reconstructing, ligature mess, etc); i can even imagine that you implement it in such a way that we can use it as alternative for the existing one (basically the simple paragraph variant)
Yes, that's my intention.
(2) TeX's hyphenation mechanism is IMHO one of the crappiest parts of TeX. I mean the way how the (non)ligatures are screwed up for discretionaries which are not used in the end. So if it is handled locally, it will be IMO
well, it works for english, which was the objective; DEK would rightfully react with: then why did nobody adapt it, replace it, etc -)
High time, huh? It works for English (does it really always ?), because it is simple, right? I don't know, whether it is a real problem in any other language in practice. I just know the code and I think that it is incorrect, inconsistent and illogical.
simpler and more correct. There are also some research results concerning hyphenation, which are not implemented in TeX, because it would be too complicated.
right, we need to add things like compound word hyphenation, dictionary support (in order to handle words that don't need the ligatures, etc)
OK.
At the first stage, I'll omit the hyphenation completelly.
or maybe some poor mans alternative: let tex give the list with all points and remove them when needed (ok, we lost kerning in the process) but it may look better that no hyphenation at all -)
Might be still more work that doing the right thing. On Tue, 20. Sep 2005, 18.39.57 18:39:57, Taco Hoekwater wrote:
Hans Hagen wrote:
How complex is it to implement this?
It should not be very complicated (not deadsimple either), but until today this was not even in the top-ten of my todo list ;-)
TeX works with the node lists all the time, it shouldn't be so difficult to make another kind of register to keep them. Well, everything can turn tricky inside TeX code. In NTS, it would be trivial ;-). --ksk
Karel Skoup wrote:
\hsize=2in \the\list0 \par % typeset the node list
So \the\list0 will expand to tokens (consistent with \write), right? It won't just insert the list on the currently active list (would be inconsistent with \write), right?
indeed btw, we have the same situation with lua: \lua{tex.print("\string\\relax")} results in just the word \relax being typeset so in order to get it texed we nee to fee din into \scantokens so, i can imagine that there is something \scanlist\expandafter{\the\toks0}
Sure, that's the multiple paragraph (stream) stuff. It will be the really tricky part, not so much for me, but in TeX, the whole model must be generalized/extended. It's not yet very clear to anybody, or is it? I think it's a real research topic.
indeed, stepwise refinement (start small -)
OK, but that won't bring much, just some funny shapes.
sure, but on the other hand, it can be used to 'replace' the current par builder by a more advanced (e.g. hyphenation) one, imagine that we have: \paroutput {write list to file (or pipe) call plugin in one-paragraph mode read list from file (or pipe)} that way we can replace the current par builder, because by default it's something equivalent to: \paroutput{\scanlist\expandafter{\the\list255}} i wonder how hard this is to implement, you and taco should know -)
- more boundary conditions - possible page crossing
Not only page crossing, but also column/shape/container crossing ... The problem is that we are used to \parshape, which just specifies something for certain lines in the current paragraph. But if we want to introduce real page layouts, then the shapes are not relative to the paragraphs any more. It will be a matter of formatting where a particular paragraph starts in the layout.
it's a combination: - a main gutter shape (can be colums or whatever) - shapes bound to places on the gutter - shapes bound to specific places in the stream - shapes that may float (within boundary condition)
Sure, that would be great. Then I won't have to access metric files at all. But should I wait for that? I wanted to start with the \showlists output for prototyping. Well, I'll see how fast will I progress. Maybe, that you'll be faster :-).
ok, i know you don't like messing around with the tex source, but i can imagine that this showlist stuff is doable, so if you want, you can provide patches to the web source; we're working with a branch of pdftex anyway;
But concerning the metric files, if I want to treat hyphenation locally, then I also need the kerning and ligature programs. In TeX it is done too early (and then it is taken apart and (wrongly) reconstructed during hyphenation pass). I want to do ligatures and kernings on demand, basically after hyphenation (it's not that simple, but anyway).
how about a font daemon, that one could cache/access font files; we need to go open type anyway so maybe such a deamon can be built on top of existing (non tex) libraries (port 31415)
NO. It screws up everything, not only taken or potential breaks, but even the potential hyphenation points which are never considered a break. It is also known too late, in the middle of the (atomic) paragraph breaking process.
ok, so that's a dead end
hyphenation points, unless we let tex do a pre break run with a zero hsize so that we get 'm all
No, no, it's much more stupid than you think. TeX first builds the horizontal list with all kernings and ligatures, taking {} (in dif{}ferent) into account. Then it tries the first breaking pass with the \pretolerance. If that fails, then it takes the whole list, tries to hyphenate *all* words in the lists, inserts the explicit \discretionaries to *every* potential hyphen and reconstructs the kernings and ligatures for the segments between the \discretionaries, loosing all ligature preventions and yielding potentially incorrect ligatures and kernings for words which are actually not hyphenated. Then it tries the second (and maybe third pass), but it looses the originally built list forever. The whole breaking is an atomic operation (happening at \par), you can't do anything between the passes.
Taco, is that correct, or am I too TeX unfriendly?
-) that's indeed too hard-coded for our purpose, so, next to a font daemon, we need a hyphenation daemon
Maybe we should make a whole new glossary, for example 'node' is quite OK for everything in the list (char, box, glue, penalty, ...), but 'list' is so ambiguous, there should be something more specific (maybe 'node list'). TeX itself doesn't give clear names (classes) for those objects. I had to make them names in NTS (to name the classes), maybe we can look into it.
good idea; we indeed need to define proper names and descriptions; can you make a proposal for that based on your nts experiences?
well, it works for english, which was the objective; DEK would rightfully react with: then why did nobody adapt it, replace it, etc -)
High time, huh?
It works for English (does it really always ?), because it is simple, right? I don't know, whether it is a real problem in any other language in practice. I just know the code and I think that it is incorrect, inconsistent and illogical.
my impression is that tehnumber of missed/wrong cases for english is so small that it falls within the 'no problem to correct it manually' criteria; languages with compound words, accented characters etc hav ehigher demands Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Karel Skoupý wrote:
Be careful about the redefinitions of the fonts in the middle of the list. It can be either forbidden or reproduced in the read.
For me the fully restorable read syntax is very important (can I get all [...]
I believe all extra parameters had better be in-line, for optimal flexibility. As much as possible, as least: some information is irretrievably lost in current TeX. Quite a lot can be solved by adding a new read syntax for character and language nodes, one that does not depend on font and language id numbers. It'll be rather verbose and a tad slow, that is the price you pay for extra flexibility.
\hsize=2in \the\list0 \par % typeset the node list
So \the\list0 will expand to tokens (consistent with \write), right? It won't just insert the list on the currently active list (would be inconsistent with \write), right?
Yes. I was aiming to be consistent with other uses of \the (easier that way). For direct insertion, something like \unlist0 would be needed (analogous to hbox operation).
But concerning the metric files, if I want to treat hyphenation locally, then I also need the kerning and ligature programs. In TeX it is done too early (and then it is taken apart and (wrongly) reconstructed during hyphenation pass). I want to do ligatures and kernings on demand, basically after hyphenation (it's not that simple, but anyway).
In current TeX, it is not done too early: ligkerns can influence which line breaks are chosen, so the ligkern programs have to be applied first thing. Only the manner in which it is done is not quite as general as should have been, resulting in the (sometimes) incorrect reconstruction of ligatures.
is hyphenation known at that time (if i got it right, tex only looks at places where breakpoint smake sense, so you don't get all possible
NO. It screws up everything, not only taken or potential breaks, but even the potential hyphenation points which are never considered a break.
It does all potential hyphenation points, but that is still a subset of all hyphenation points: absolutely impossible points are ignored (like in the middle of the first line). At least, that's what Knuth's web comments say, and note rhat is not a feature of the algorithm, only an optimization.
..[]... Taco, is that correct, or am I too TeX unfriendly?
Perhaps just a little, but you have a valid case ;-)
It works for English (does it really always ?), because it is simple,
Considering how strange the code is, it works fairly well for a surprising number of languages.
right? I don't know, whether it is a real problem in any other language in practice. I just know the code and I think that it is incorrect, inconsistent and illogical.
It is also near-impossible to fix while maintaining compatibility, which is probably why no-one has seriously attempted to clean up the code, up-til-now. Greetings, Taco
Taco Hoekwater wrote:
is hyphenation known at that time (if i got it right, tex only looks at places where breakpoint smake sense, so you don't get all possible
NO. It screws up everything, not only taken or potential breaks, but even the potential hyphenation points which are never considered a break.
It does all potential hyphenation points, but that is still a subset of all hyphenation points: absolutely impossible points are ignored (like in the middle of the first line). At least, that's what Knuth's web comments say, and note rhat is not a feature of the algorithm, only an optimization.
so what happens if you remove the optimizations (forget about 100% compatibility)
It is also near-impossible to fix while maintaining compatibility, which is probably why no-one has seriously attempted to clean up the code, up-til-now.
but we don't care much about that part of compatibility, do we? Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hans Hagen wrote:
so what happens if you remove the optimizations (forget about 100% compatibility)
Probably (hopefully) nothing except some bloat in the data structure, but I won't take bets on that.
It is also near-impossible to fix while maintaining compatibility, which is probably why no-one has seriously attempted to clean up the code, up-til-now.
but we don't care much about that part of compatibility, do we?
Nah. (but it was a big issue for etex, nts, and pdftex-in-dvi mode) Taco
Karel Skoup wrote:
* single paragraph stuff
I need: (1) complete representation of all the stuff which is to be returned formatted (2) sizes of all the objects which are involved in formatting (3) properties which influence the formatting (breakable, discardable, ...)
if this paragraph crosses a page, you may need to know the available room as well, so things like pagegoal and pagetotal also need to be communicated (maybe also left/right page state if the shape is page dependent i.e. inner or outer margin bound)
It seems that the standard output of \showlists (or \showbox) will mostly do. (1) is fulfilled I guess (the returned input needs to be only slightly modified to fit TeX).
it will probably do for the first prototype; i can imagine that you implement several strategies, - simple paragraph, on page - more boundary conditions - possible page crossing etc
(2) is little bit tricky, because for the characters I get only an id of the font. So I will need to know the exact reference to a real font to get the metrics information. This can be learned by eg. \show\tenrm. But of course it is not know in advance what fonts are used in the paragraph, so either all fonts can be listed at the beginning -- but where to get the list of all font definitions, and the definitions can actually change in the middle of the paragraph -- or I can make a first pass, collect the font ids and ask for them in the second pass. It will be bit tricky and won't be reliable due to redefinitions (I can also change the current id using \let and lose the old id (still used in the log), right?), so it will be OK for experimenting but for a real version, I will need a better support from TeX.
Why not resolve that info on forehand? Since the order does not change, I can imagine passing chars as some kind of special charbox (wd,dp.ht + ref) and when reading back, the ref can be used to insert the char node again; we don't need to save bytes -)
(3) is implicit, right?
is hyphenation known at that time (if i got it right, tex only looks at places where breakpoint smake sense, so you don't get all possible hyphenation points, unless we let tex do a pre break run with a zero hsize so that we get 'm all
* stream of paragraphs
I can need even the whole chapter, because I want to treat - shapes and layouts, which are relative to page and not to a particular paragraph - pagination, floats placement
let's talk of chunks instead of chapters and moving objects instead of floats -) this is not easy, so that will be a stepwise refinement of the specs; something like a sequence of master shapes (normally rectangular text areas), frozen forbidden areas (anchored on pages) and movable forbidden/reserved areas (afterwards they may get content overlayed); we need some kind of 'special' mechanism where certain places in the constructed list can get postprocessed/things attached etc
For the basic experimenting I can redefine \par to something like \hfil\break\indent but it will restrict all kinds of things which can happen between the paragraphs (in vertical mode). Of course, the whole thing will never be compatible to TeX, because TeX expect after \par that the last paragraph was formatted and placed on the vertical list. So it will be responsibility of the user/macro-programmer to bear the consequences of using the alternative mechanism. Nevertheless, the consequences should be as small as possible.
redefining par will mess up a lot of things
So for the prototyping I can redefine \par or perhaps I can store the whole paragraphs in infinite hboxes (redefining \hsize?) or maybe I can use some \specials for tagging, but for the production version, this will be a very tricky part. Not so much for the engine, but mainly on the TeX side. It should be of a great concern for people who would want to use the new algorithms in their systems (Hans?), (after those ideas are first tested by a prototype :-).
see taco's mail, we should built a list writer
* passing the parameters specific to the new algorithms
- layouts, shapes - maybe others, like weights for resolving paragraph contra page breaking
This will be a new thing so I hope that there is no compatibility burden.
* hyphenation
indeed in the end this is needed
It will be a lot of additional work, but I think that I should handle it locally. There are two reasons:
(1) the protocol for failing and getting the list with new discretionaries (TeX's 2nd pass) for every individual paragraph would be extremely complicated, in the end it might be more difficult than handling it locally.
indeed messing around with tex's list is painful (reconstructing, ligature mess, etc); i can even imagine that you implement it in such a way that we can use it as alternative for the existing one (basically the simple paragraph variant)
(2) TeX's hyphenation mechanism is IMHO one of the crappiest parts of TeX. I mean the way how the (non)ligatures are screwed up for discretionaries which are not used in the end. So if it is handled locally, it will be IMO
well, it works for english, which was the objective; DEK would rightfully react with: then why did nobody adapt it, replace it, etc -)
simpler and more correct. There are also some research results concerning hyphenation, which are not implemented in TeX, because it would be too complicated.
right, we need to add things like compound word hyphenation, dictionary support (in order to handle words that don't need the ligatures, etc)
At the first stage, I'll omit the hyphenation completelly.
or maybe some poor mans alternative: let tex give the list with all points and remove them when needed (ok, we lost kerning in the process) but it may look better that no hyphenation at all -) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
participants (3)
-
Hans Hagen
-
Karel Skoupý
-
Taco Hoekwater