Paragraph breaking bug with BiDi text
Hello, When a line ends with a sequence whose direction differs from that of the paragraph, we risk pushing some text into the margin (when not necessary). Here is an example with corresponding output: \usemodule[simplefonts] \setmainfont[ALM Fixed][features=arabic,range=arabic] \setupalign[r2l] \setupwhitespace[big] \showframe \starttext % 10 copies of Persian word "hello" stay on one line. \dorecurse{10}{سلام } % 20 copies makes a 2-line paragraph. \dorecurse{20}{سلام } % one copy of the word goes into the margin although the Latin letters perfectly fit the line. \dorecurse{10}{سلام } {\textdir TLT\dorecurse{20}{a}} \dorecurse{10}{سلام } % although the Latin string extends into the margin, TeX still puts one copy of "hello" there as well. \dorecurse{10}{سلام } {\textdir TLT\dorecurse{30}{a}} \dorecurse{10}{سلام } % something similar happens here with the opposite par/text dir \pardir TLT \dorecurse{10}{bidi } {\textdir TRT\dorecurse{20}{آ}} \dorecurse{10}{bidi } \stoptext The problem seems to be that after typesetting the LTR text within the RTL paragraph, TeX thinks the current text ends at the left end of the LTR portion; hence, it tries to add something to the line; it's only after that that it discovers we ran into the margin! —MHB
Minor point: changing "\pardir TLT" to "\pardir TLT\textdir TLT" in the
last paragraph produces better visual, however, the previous paragraphs
already demonstrate the problem sufficiently.
On Fri, Jan 8, 2016 at 10:40 PM, Mohammad Hossein Bateni
Hello,
When a line ends with a sequence whose direction differs from that of the paragraph, we risk pushing some text into the margin (when not necessary). Here is an example with corresponding output:
\usemodule[simplefonts] \setmainfont[ALM Fixed][features=arabic,range=arabic] \setupalign[r2l] \setupwhitespace[big] \showframe
\starttext
% 10 copies of Persian word "hello" stay on one line. \dorecurse{10}{سلام } % 20 copies makes a 2-line paragraph. \dorecurse{20}{سلام }
% one copy of the word goes into the margin although the Latin letters perfectly fit the line. \dorecurse{10}{سلام } {\textdir TLT\dorecurse{20}{a}} \dorecurse{10}{سلام }
% although the Latin string extends into the margin, TeX still puts one copy of "hello" there as well. \dorecurse{10}{سلام } {\textdir TLT\dorecurse{30}{a}} \dorecurse{10}{سلام }
% something similar happens here with the opposite par/text dir \pardir TLT \dorecurse{10}{bidi } {\textdir TRT\dorecurse{20}{آ}} \dorecurse{10}{bidi }
\stoptext
The problem seems to be that after typesetting the LTR text within the RTL paragraph, TeX thinks the current text ends at the left end of the LTR portion; hence, it tries to add something to the line; it's only after that that it discovers we ran into the margin!
—MHB
On 1/9/2016 4:47 AM, Mohammad Hossein Bateni wrote:
Minor point: changing "\pardir TLT" to "\pardir TLT\textdir TLT" in the last paragraph produces better visual, however, the previous paragraphs already demonstrate the problem sufficiently.
It is a side effect of what the the par builder considers to be valid breakpoints. The current approach is playing very safe but after looking at it Taco and I decided that it can be a more tolerant with respect to end dirs so the next luatex version will have that. Anyway: You need to code carefully: the space after "TRT" in "\textdir TRT x" is meaningful so in your example you introduce spaces. Also, in context don't use \textdir etc directly, just use \lefttoright and \righttoleft in combination with \setupalign as I will not spend much time on side effects of interfering with these low level dir changers directly.
On Fri, Jan 8, 2016 at 10:40 PM, Mohammad Hossein Bateni
mailto:bateni@gmail.com> wrote: Hello,
When a line ends with a sequence whose direction differs from that of the paragraph, we risk pushing some text into the margin (when not necessary). Here is an example with corresponding output:
\usemodule[simplefonts] \setmainfont[ALM Fixed][features=arabic,range=arabic] \setupalign[r2l] \setupwhitespace[big] \showframe
\starttext
% 10 copies of Persian word "hello" stay on one line. \dorecurse{10}{سلام } % 20 copies makes a 2-line paragraph. \dorecurse{20}{سلام }
% one copy of the word goes into the margin although the Latin letters perfectly fit the line. \dorecurse{10}{سلام } {\textdir TLT\dorecurse{20}{a}} \dorecurse{10}{سلام }
% although the Latin string extends into the margin, TeX still puts one copy of "hello" there as well. \dorecurse{10}{سلام } {\textdir TLT\dorecurse{30}{a}} \dorecurse{10}{سلام }
% something similar happens here with the opposite par/text dir \pardir TLT \dorecurse{10}{bidi } {\textdir TRT\dorecurse{20}{آ}} \dorecurse{10}{bidi }
\stoptext
The problem seems to be that after typesetting the LTR text within the RTL paragraph, TeX thinks the current text ends at the left end of the LTR portion; hence, it tries to add something to the line; it's only after that that it discovers we ran into the margin!
—MHB
___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________
-- ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
It is a side effect of what the the par builder considers to be valid breakpoints. The current approach is playing very safe but after looking at it Taco and I decided that it can be a more tolerant with respect to end dirs so the next luatex version will have that.
Anyway: You need to code carefully: the space after "TRT" in "\textdir TRT x" is meaningful so in your example you introduce spaces.
Thanks for the explanation. So the space after \textdir TLT is meaningful but the one after \lefttoright isn't. (I guess, I should have used \textdir TLT\relax. I'm going to use the high-level \lefttoright command from now on.) Also, in context don't use \textdir etc directly, just use \lefttoright and
\righttoleft in combination with \setupalign as I will not spend much time on side effects of interfering with these low level dir changers directly.
Great! You partly answered another question I meant to ask: what is the proper way to write \textdir and \pardir in CONTEXT? So \lefttoright and \righttoleft are replacements for \textdir TLT and \textdir TRT. How should I code in \pardir TRT in CONTEXT? I couldn't find anything for that in spac-ali.mkiv. Those low-level LuaTeX directives were part of an attempt to come to the core of a problem I'd run into with numbers at the end of a right-to-left line. The BiDi algorithm correctly gave left-to-right direction to that digit sequence and the result was that the word following the number had gone into the margin. —MHB
Mohammad Hossein Bateni mailto:bateni@gmail.com 11. Januar 2016 um 17:58 Great! You partly answered another question I meant to ask: what is the proper way to write \textdir and \pardir in CONTEXT? So \lefttoright and \righttoleft are replacements for \textdir TLT and \textdir TRT. How should I code in \pardir TRT in CONTEXT? I couldn't find anything for that in spac-ali.mkiv. The \lefttoright and \righttoleft commands set the paragraph direction when you use the commands at the begin of a paragraph.
\setupwhitespace[line] \starttext \input ward \righttoleft \input ward \lefttoright \input ward \stoptext Still there is no need for \righttoleft because you can use the normal alignment commands {\setupalign and \startalignment) to change the text direction. \setupwhitespace[line] \starttext \input ward \startalignment[righttoleft] \input ward \stopalignment \input ward \stoptext Wolfgang
On 1/11/2016 5:58 PM, Mohammad Hossein Bateni wrote:
Thanks for the explanation. So the space after \textdir TLT is meaningful but the one after \lefttoright isn't. (I guess, I should have used \textdir TLT\relax. I'm going to use the high-level \lefttoright command from now on.)
in addition to wolfgangs explanation: you can use grouping foo {\righttoleft oof} foo it's best to keep the spaces at the outer level so no foo{ \righttoleft oof }foo ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
participants (3)
-
Hans Hagen
-
Mohammad Hossein Bateni
-
Wolfgang Schuster