[Aleph] RfC: Boustrophedon typesetting

Daniel Richard G. aleph@ntg.nl
Sun, 14 Mar 2004 04:44:23 -0500


Hello, one and all. I'm a recent comer to the Aleph project. My interest
centers on the possibility of contributing one particular feature that is
not [to my knowledge] available in any other existing typesetting system.
This message is a request for comments on said potential feature:


BOUSTROPHEDON TYPESETTING


1. Introduction

Boustrophedon is a little-used means of writing text in which lines run
alternatingly left-to-right and right-to-left. The right-to-left lines are
rendered as though between e-Tex's \beginR and and \endR; that is, a
mirrored version of how they would otherwise be written in normal
left-to-right mode. (Even the characters are mirrored, so a different font
would be used, a la xbmc10.)

Some quick links to illustrate the idea:

	http://www.iskunk.org/tmp/sample-1.pdf
	  (sample text generated with the "boust" program)

	http://traevoli.com/boust/screen.php
	  (screenshots page for the "boust" program)

	http://en.wikipedia.org/wiki/Boustrophedon
	  (some background and history)

Closely related to the above is "rongorongo" layout, which I believe would
require \epsilon more effort to support. It follows the same alternating
LtR/RtL scheme, except that the RtL lines are rendered not by mirroring the
line, but by rotating it 180 degrees (!). An example:

	http://www.iskunk.org/tmp/sample-2.pdf

(The term "rongorongo" is borrowed from the name of an Easter Island script
which was written in this manner. I don't know if this is the best term to
use, but I use it for lack of a better one.)


2. Layout (justification)

Because boustrophedon is read in a manner dramatically different from
normal all-left-to-right or all-right-to-left text, it has different
requirements as far as basic layout is concerned. The most critical one is
this:

	THE ALIGNMENT RULE: At any given line break within a paragraph, the
	beginning of the lower line must be vertically aligned with the end
	of the upper line.

An example: I will use "]]]]" to indicate left-to-right text, and "[[[[" to
indicate right-to-left. Can you tell why the following ragged-right
boustrophedon text (as might be produced by an unmodified TeX) would be
hard to read?

	]] ]]]]]]]] ]]] ]]]]] ]]]]]]] ] ]]]] ]]]]]] ]]]]]]]]
	[[[[ [[[[[ [[ [[[[[ [[[ [[[[[[[ [[[[[[[ [[[[[[
	]]]]]]]]]]]]]]]] ]]] ]]] ]]]] ]]]]]] ]] ]]] ]]]]]]]
	[[[[[[[[ [[ [[[ [[[[[[ [[[[[[[[ [[[[[[ [[[[[[[ [[[[[[
	]]]] ]]]]]] ]] ]]] ]]]]] ]]] ]]]]]]]] ]]]]] ]] ]]]]]]]]
	[[[[ [[[ [[[[[[[[ [[[[[ [[[[ [[[[[[[

At the end of every left-to-right line, the eye has to travel by a varying
amount (in a varying direction, even) to get to the beginning of the
subsequent right-to-left line. This gets tiring, and can easily be cited as
poor typesetting design.

The above rule suggests how we should shift the lines to fix the problem:

	]] ]]]]]]]] ]]] ]]]]] ]]]]]]] ] ]]]] ]]]]]] ]]]]]]]]
	      [[[[ [[[[[ [[ [[[[[ [[[ [[[[[[[ [[[[[[[ [[[[[[
	      ]]]]]]]]]]]]]]]] ]]] ]]] ]]]] ]]]]]] ]] ]]] ]]]]]]]
	    [[[[[[[[ [[ [[[ [[[[[[ [[[[[[[[ [[[[[[ [[[[[[[ [[[[[[
	    ]]]] ]]]]]] ]] ]]] ]]]]] ]]] ]]]]]]]] ]]]]] ]] ]]]]]]]]
	                       [[[[ [[[ [[[[[[[[ [[[[[ [[[[ [[[[[[[

We now have the most basic form of boustrophedon layout. It is analogous to
\raggedright, in that it requires no adjustments to any interword spacing.
I call this "doubly-ragged justification," owing to the two ragged margins
that it produces. This isn't likely to see use in any fine books, but it
would be an omission not to offer this as an option. (Note that the above
example isn't terribly good; a typesetter would obviously have chosen
better line breaks!)

(Oh, and FYI: This is the layout used in the "boust"-generated sample pages
linked to above. In fact, the program offers no other option.)

Other possibilities? There is "singly-ragged justification," which would
give something like

	]]]] ]]] ]]]]] ] ]]]] ]] ]]]]] ]]]]] ]] ]]] ]]]]]]]. ] ]]]]]
	[[[[[  [  [[  [[[[[  [[[[[[  [[[[[[  [[[  [[[  [[[[[[[  [[[[
	]]]]]] ]]]]]]]]]. ]]]]]] ]] ]]]]] ]] ] ]]]]]]]]]]. ]]] ]]]]] ]]
	[[[[[ [[[[[ [[[[[[[[ [[[[[[ [[[[ [[ [[ [[[[[[[[[ [[[ [[[[ [[[[[
	]]]]  ]]]  ]]]]]]  ]]]]]]]]]  ]]]]  ]]]  ]]]]]].  ]]]  ]]]]]]
	                            .[[[[[[ [[ [ [[[[ [[[[[ [[[[[[[[[

So some extra interword spacing is added, to yield one straight margin. A
boustrophedon TeX/LaTeX format might have some sort of directive so that in
book setting, the straight margin always goes on the side of the page
opposite the spine. Likewise on the direction of the first line of a
paragraph (i.e. always going toward the spine).

Then, of course, there's the flawless full justification that TeX is so
good at producing:

	]]]]  ]]]  ]]]]]  ]  ]]]]  ]]  ]]]]]  ]]]]  ]]  ]]]  ]]]]]].  ]]]
	[[[[ [[[[[ [[ [[[[[ [[ [[[[[ [[[[[[ [[[[[[ [[[ [[[ [[[[[[[ [[[[[[
	]]]]]]  ]]]]]]]]]].  ]]]]]]  ]]]  ]]]]]  ]]  ]]]]]]]]].  ]]]  ]]]
	[[[[[[  [[[[[  [[[[[[[[  [[[[[[[  [[[[  [[  [[  [[[[[[[[  [[[[[[[
	]]]] ]]] ]]]]]] ]]]]]]]] ]] ]]]] ]]] ]]]]]]]] ]]] ]]]]]] ]]]]]]]]
	                               .[[[[[[ [[ [ [[[[ [[[[[ [[ [[[[[[[

There are undoubtedly some variations on the above schemes. One that the
"boust" program implements is an extension of the aforementioned rule, such
that a paragraph---if it is preceded by another paragraph---begins its
first line directly below where the last line of the previous paragraph
ended, and in the same direction. An example which shows this clearly:

	http://www.iskunk.org/tmp/sample-3.pdf

It doesn't look too good here, but imagine if the inter-paragraph spacing
were reduced, such that the paragraphs snug up a bit into each other. That
could potentially be done in an eye-friendly way.

A separate tweak might be to reverse the paragraph's first line if it
begins close to the margin, so you don't get e.g. that very short first
line---"In the"---on the fourth paragraph.

I don't know of any other variations, unfortunately. Boustrophedon is
already a very esoteric topic, and fine boustrophedon typesetting seems
almost nonexistent as a subject. (Try Googling for it sometime }:)


3. Hyphenation

The distinct nature of boustrophedon invites a rethink of how hyphenation
is indicated in the final typeset copy. Do we use the same convention as
in normal typesetting, where the word is split at some acceptable point,
the first part is appended with a hyphen character, and the second part is
bumped down to the start of the next line?

We could do that, but as the eye has to travel only a small amount to reach
the remainder of the word (as opposed to going all the way back across the
paragraph), I would like to make possible an alternative convention. See
the second paragraph in the this image:

	http://www.iskunk.org/tmp/boustrophedon.png

The words are joined by a bracket, with legs of unequal length. (Note that
this is a quite a crude rendition; the bracket really should be narrower.)
This is almost the same as the usual approach to hyphenation, except that a
box/glue is now inserted _after_ the break in addition to before. (The
bracket itself might be handled as some kind of expandable character, to
allow variations in line spacing.)

How would this be done if the text block area is not rectilinear, e.g. if
we're setting shaped paragraphs? I have noooo idea... :]


4. WHY?

My immediate motivation for this is to be able to typeset a fictitious
writing system, that is normally rendered in boustrophedon. (The same
demented imagination that produced it has given us most of the
"conventions" described above }:)

Would this be useful to a large proportion of our users? Hell no. Would it
be cool to do it anyway? Definitely. Would we get to brag about having
implemented the first and only high-quality typesetting engine capable of
boustrophedon? I've yet to find anyone else who's done so....


5. Implementation

I am nearly certain that to make all this functionality happen will require
modifications to TeX/Aleph itself, which is why I am here and not on
comp.text.tex <g>  (If all of this can be done reasonably well in a format,
I'd love to know how, but I'm not holding my breath.)

Note: I'm still struggling with the original 1981 Knuth/Plass paper
describing the TeX line-breaking algorithm, but I think I have a grasp on
the basics of how it operates. Please correct me if I get anything wrong
w.r.t. that.

I'll go over each separate thrust of work that I think will be needed.

5a. Alternate hyphenation convention

[This may or may not require code changes---it may be implementable via the
funky "algebra" of box/penalty/glue nodes that Knuth describes in his
paper---but for now, my working assumption is that changes will be needed.
I do need to investigate this further.]

According to Knuth's paper, in the line-breaking algorithm, potential
hyphenation points in words are marked by penalty nodes, which are
described by three values:

	p = penalty amount
	w = width of typeset material to append to the line if the line is
	    broken at this point (usu. the width of a hyphen)
	f = is this penalty flagged or unflagged? (boolean)

The alternate hyphenation convention would need two widths instead of one:

	w0 = (same as w, above)
	w1 = width of typeset material to prepend to the next line, if the
	     line is broken at this point

So (p,w0,w1,f) would describe the appropriately extended penalty node. w1,
naturally, would be zero in non-boustrophedon contexts.

5b. Singly-/doubly-ragged justification

I believe this is going to be one of the trickier bits. Let me render for
you a doubly-ragged paragraph, with the text block boundaries shown. I will
also mark two consecutive glue nodes A and B with "----":

	|]] ]]]]]] ]]] ]]]]] ]]]]]]] ] ]]]] ]]]]]] ]]]]]]]]----| <- A
	|    [[[[ [[[[[ [[ [[[[[ [[[ [[[[[[[ [[[[[[[ [[[ [[----| <- B
	|    ]]]]]]]]] ]]]]]] ]]] ]]] ]]]] ]]]]]] ]] ]]] ]]]]] |
	|  [[[[[[[[ [[ [[[ [[[[[[ [[[[[[[[ [[[[[[ [[[[ [[[[[[[ |
	|  ]]]] ]]]]]] ]] ]]] ]]]]] ]]] ]]]]]]]] ]]] ]]]]]]]]  |
	|         [[[[[ [[[[[ [[[[ [[[ [[[[[ [[[ [[[[ [[[[[[[  |

The alignment rule given earlier imposes the requirement that A and B be of
equal length, along with other pairs of glue nodes surrounding a line
break. Glue A can stretch and shrink, depending on how TeX wants to set the
line, but glue B must inflexibly match whatever width A turns out to have.

(Alternately, you could pose the problem this way: The amount of space into
which each line can be fit depends on how far from the margin the previous
line ends. In non-boustrophedon contexts, these widths are fixed and fluid,
respectively---but here, because the two are linked, both are fluid.)

I'm not sure how this situation should be handled/represented within the
line-breaking algorithm. Should we have a new kind of glue node, that can
express two spacings with a line break in between?

A "glue_ref" node, with stretchability/shrinkability 0 and a width that is
essentially a pointer to that of some normal glue node?

Might there be some way of pre-processing the line-breaker's inputs,
post-processing the outputs, using some different assumptions, etc. to give
the desired result without otherwise modifying the algorithm? (E.g. by
treating A and B as a single large glue node, and doing some other
calculations?)

5c. Reverse font metrics

So we've said that the right-to-left lines will usually be rendered in a
reflected font (e.g. xbmc10). One can expect that this reflected font will
be exactly the same as the left-to-right font, only mirrored. (I believe
there is a simple Metafont trick that will do this, basically scaling the
x-axis of everything by -1.)  But what if the reverse font has different
metrics? What if, say, you want the boustrophedon text to be rendered in a
slanted font, but you want the slant to be in the same direction for both
LtR and RtL lines? Or you want to differentiate the RtL lines with a bold
face?

This issue is, I believe, confined to the very heart of the line-breaking
code---the dynamic-programming algorithm itself. Basically, each box node
will be able to have one of two widths, and which one is in effect depends
on where the line breaks fall. If the two widths are unequal, the algorithm
will have to run through more possibilities to find optimal breaks, and so
run less efficiently. I really need to understand the algorithm better to
be able to say what changes would be needed, but I am fairly confident that
this will not require changes elsewhere.

(Btw: This would yield a feature that can be used in non-boustrophedon
contexts: typesetting a paragraph with alternating lines in different
fonts. Heck, we could make it so the user can specify a whole sequence of
fonts, one for each successive line of a paragraph... }:]


--------------------------------


Now, I do want to make clear that all this is not just a particularly
detailed feature request; I intend to drive this work myself. I will,
however, not be able to do it all on my own---so the most I am asking for
here, aside from comments and advice on how to proceed, is the occasional
spot of help in doing so!

The point on which I am most keen to hear ideas is 5b (i.e. how to get the
line-breaker to respect the alignment rule). Also, if anyone is familiar
with the topic of boustrophedon typesetting, and what (if any) conventions
exist for it, I'd love to learn of them. It would help us get the
terminology straight, probably suggest some approaches not covered here,
and in general save me from having to work in a vacuum :)


Here's hoping to somehow make all this a reality,


--Danny


-- 
NAME   = Daniel Richard G.       ##  Remember, skunks       _\|/_  meef?
EMAIL1 = skunk@iskunk.org        ##  don't smell bad---    (/o|o\) /
EMAIL2 = skunk@alum.mit.edu      ##  it's the people who   < (^),>
WWW    = http://www.******.org/  ##  annoy them that do!    /   \
--
(****** = site not yet online)