automatic character comparison

Dear list, I have the following sample: \setuppapersize[A7] \starttext \definecolumnset [paral] [n=2] \definesubcolumnset[paral][1][1] \definesubcolumnset[paral][2][2] \startcolumnset[paral] \startsubcolumnset[1] abce \stopsubcolumnset \startsubcolumnset[2] abcd \stopsubcolumnset \flushsubcolumnsets[spread] \startsubcolumnset[1] abc% \inframed [background=color, backgroundcolor=lightgreen, frame=off] {e} \stopsubcolumnset \startsubcolumnset[2] abc% \inframed [background=color, backgroundcolor=lightred, frame=off] {d} \stopsubcolumnset \flushsubcolumnsets[spread] \stopcolumnset \stoptext What I want to achieve is automatic text comparison between versions of the same text (in different subcolumnsets). The first line shows different versions text. I wonder whether there would be an automatic way to get the \inframed highlighting with any character that differs from the other column (it might be different, or just missing or being added). I think this may be possible with ConTeXt, but I don’t know how to achieve it automagically. Any ideas on how to get this automatic text comparison? Many thanks in advance, Pablo

On 4 Mar 2025, at 18:26, Pablo Rodriguez via ntg-context
wrote: The first line shows different versions text. I wonder whether there would be an automatic way to get the \inframed highlighting with any character that differs from the other column (it might be different, or just missing or being added).
I'm not aware of anything built-in. One way would be to use buffers: you enter your text into two buffers ("left" & "right"); compare them using a script which then modifies the buffers to highlight the changes in red or green; and then \getbuffer the buffers from inside the columnset commands to print them. There's a good discussion of comparison algorithms at the link below, including source code in Javascript (but not Lua, unfortunately). However, Context supports Javascript with \startJScode ... \stopJScode so you could try adapting what's there. https://neil.fraser.name/writing/diff/ Regards, — Bruce Horrocks Hampshire, UK

On 5 Mar 2025, at 01:21, Bruce Horrocks
wrote: On 4 Mar 2025, at 18:26, Pablo Rodriguez via ntg-context
wrote: The first line shows different versions text. I wonder whether there would be an automatic way to get the \inframed highlighting with any character that differs from the other column (it might be different, or just missing or being added).
I'm not aware of anything built-in.
One way would be to use buffers: you enter your text into two buffers ("left" & "right"); compare them using a script which then modifies the buffers to highlight the changes in red or green; and then \getbuffer the buffers from inside the columnset commands to print them.
There's a good discussion of comparison algorithms at the link below, including source code in Javascript (but not Lua, unfortunately). However, Context supports Javascript with \startJScode ... \stopJScode so you could try adapting what's there.
Sorry - ignore the JS bit (that's for embedding into the PDF). You'll need to translate Fraser's example code into Lua. — Bruce Horrocks Hampshire, UK

On 3/5/25 02:27, Bruce Horrocks wrote:
On 5 Mar 2025, at 01:21, Bruce Horrocks
wrote: [...] I'm not aware of anything built-in. One way would be to use buffers: you enter your text into two buffers ("left" & "right"); compare them using a script which then modifies the buffers to highlight the changes in red or green; and then \getbuffer the buffers from inside the columnset commands to print them.
Many thanks for your reply, Bruce. I think this might be a a feasible approach for me.
There's a good discussion of comparison algorithms at the link below, including source code in Javascript (but not Lua, unfortunately).
There might be a Lua version (I think) here: https://github.com/google/diff-match-patch.
However, Context supports Javascript with \startJScode ... \stopJScode so you could try adapting what's there.
https://neil.fraser.name/writing/diff/> Sorry - ignore the JS bit (that's for embedding into the PDF). You'll need to translate Fraser's example code into Lua.
We have https://www.pragma-ade.com/general/manuals/ecmascript-mkiv.pdf, so translating JS to Lua might not be required. Many thanks for your help, Pablo

On 3/5/25 02:21, Bruce Horrocks wrote:
On 4 Mar 2025, at 18:26, Pablo Rodriguez wrote:
The first line shows different versions text. I wonder whether there would be an automatic way to get the \inframed highlighting with any character that differs from the other column (it might be different, or just missing or being added).
I'm not aware of anything built-in.
[In short, my previous request intended how to have an automatic comparison of two versions from the same text automatically done.] Replying to this message from Bruce, I want to describe what I think it might do the job. Since I’m just an average computer user (my background is in humanities), I thank everyone for comments about whether this make sense (or not at all). Not being inclined to reinvent the wheel, after some searching I found out that "git diff" can do a char-level comparison between two texts: git diff -U1000 --color-words=. one.md two.md > one-two.diff [BTW, I use Markdown sources (which pandoc converts to XHTML and ConTeXt typesets them).] Since the output contains the coloring commands, I need some substitutions with: sed -E -f normal.sed one-two.diff > one-two_normal.diff The contents of the sed script read: s/(^[#]{2,3})\x1B\[m$/\1/g s/\x1B\[(36|1).+?\x1B\[m//g s/\x1B\[31m/\\Subs{/g s/\x1B\[32m/\\Add{/g s/\x1B\[m/}/g Basically, this script removes info that ConTeXt cannot handle and translates color codes to \Add and \Subst commands. This minimal sample: another te\Subs{x}\Add{s}t On the left page with the older text, it might have the commands: \protected\def\Add#1{} \definehighlight[Subs] [color=red] On the right page with the newer version, commands might read: \definehighlight[Adds] [color=green] \protected\def\Subst#1{} At least, this works with a minimal sample. Is this a feasible approach? I don’t need the most efficient solution, just one that I can handle and that just works. Many thanks in advance for your comments, Pablo

On 3/25/2025 7:48 PM, Pablo Rodriguez via ntg-context wrote:
On 3/5/25 02:21, Bruce Horrocks wrote:
On 4 Mar 2025, at 18:26, Pablo Rodriguez wrote:
The first line shows different versions text. I wonder whether there would be an automatic way to get the \inframed highlighting with any character that differs from the other column (it might be different, or just missing or being added).
I'm not aware of anything built-in.
[In short, my previous request intended how to have an automatic comparison of two versions from the same text automatically done.]
Replying to this message from Bruce, I want to describe what I think it might do the job.
Since I’m just an average computer user (my background is in humanities), I thank everyone for comments about whether this make sense (or not at all).
Not being inclined to reinvent the wheel, after some searching I found out that "git diff" can do a char-level comparison between two texts:
git diff -U1000 --color-words=. one.md two.md > one-two.diff
[BTW, I use Markdown sources (which pandoc converts to XHTML and ConTeXt typesets them).]
Since the output contains the coloring commands, I need some substitutions with:
sed -E -f normal.sed one-two.diff > one-two_normal.diff
The contents of the sed script read:
s/(^[#]{2,3})\x1B\[m$/\1/g s/\x1B\[(36|1).+?\x1B\[m//g s/\x1B\[31m/\\Subs{/g s/\x1B\[32m/\\Add{/g s/\x1B\[m/}/g
Basically, this script removes info that ConTeXt cannot handle and translates color codes to \Add and \Subst commands.
This minimal sample:
another te\Subs{x}\Add{s}t
On the left page with the older text, it might have the commands:
\protected\def\Add#1{} \definehighlight[Subs] [color=red]
On the right page with the newer version, commands might read:
\definehighlight[Adds] [color=green] \protected\def\Subst#1{}
At least, this works with a minimal sample.
Is this a feasible approach? I don’t need the most efficient solution, just one that I can handle and that just works.
Many thanks in advance for your comments,
Whatever works for you is okay right? The attached is what is coming one day. The prototype (some 150 lines of code) works ok here but we need some interface that MS and I will look into when we pick up the columnsets track. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------

On 3/25/25 20:04, Hans Hagen wrote:
[...] Whatever works for you is okay right?
The attached is what is coming one day. The prototype (some 150 lines of code) works ok here but we need some interface that MS and I will look into when we pick up the columnsets track. Many thanks for the new implementation, Hans.
Is there any chance to become an early adopter of the new prototype for testing purposes? Many thanks for your help, Pablo

On 25 Mar 2025, at 18:48, Pablo Rodriguez via ntg-context
wrote: Is this a feasible approach? I don’t need the most efficient solution, just one that I can handle and that just works.
I think relying on diff's colouring and a 1000 line change window would work but is not robust as it might unexpectedly break - e.g. if you were to port to another machine or change your terminal settings then you might get different escape sequences for the colours. An alternative might be to use 'wdiff' which does a word-based comparison instead of the line-based comparison of diff. It also allows you to insert your choice of marker string before and after each change, making it easy to insert Context markup. There's a LaTeX example in section 2.2 on this page which puts deleted text in boxes, and new text in double boxes. It should be pretty simple to adapt. https://www.gnu.org/software/wdiff/manual/wdiff.html Regards, — Bruce Horrocks Hampshire, UK

On 3/26/25 23:57, Bruce Horrocks wrote:
On 25 Mar 2025, at 18:48, Pablo Rodriguez via ntg-context
wrote: Is this a feasible approach? I don’t need the most efficient solution, just one that I can handle and that just works.
I think relying on diff's colouring and a 1000 line change window would work but is not robust as it might unexpectedly break - e.g. if you were to port to another machine or change your terminal settings then you might get different escape sequences for the colours.
Many thanks for your reply, Bruce. Besides the fact that ConTeXt will have built-in functionality for this, I think that it would be easy for me to adapt the different escape sequences for colors (in the rather improbable case I have to port it to another machine or change my terminal settings). BTW, after I sent the message, I relalized that my approach was wrong in a detail. I wanted to add TeX commands in a Markdown source (which was going to be converted to XHTML). The right thing to do is to convert the escape sequences for colors into XML tags in the Markdown source.
An alternative might be to use 'wdiff' which does a word-based comparison instead of the line-based comparison of diff. It also allows you to insert your choice of marker string before and after each change, making it easy to insert Context markup.
I knew there was such an option, but it isn’t available for MSYS2 (just in case I might need it there one day) and as far as I can remember it compares whole words, not single characters. Many thanks for your help, Pablo

Hi Pablo, On Tue, 2025-03-04 at 19:26 +0100, Pablo Rodriguez via ntg-context wrote:
What I want to achieve is automatic text comparison between versions of the same text (in different subcolumnsets).
The first line shows different versions text. I wonder whether there would be an automatic way to get the \inframed highlighting with any character that differs from the other column (it might be different, or just missing or being added).
I think this may be possible with ConTeXt, but I don’t know how to achieve it automagically.
Not quite what you're asking for, but the "compare" script does something fairly similar: $ context --extra=compare <filename-1>.pdf <filename-2>.pdf $ context --extra=compare <filename-1>.pdf <filename-2>.pdf --colors=red,blue --result=<output-name>.pdf The source for that script is in tex/context/base/mkiv/mtx-context-compare.tex so maybe you can put together something similar from there? Thanks, -- Max

On 3/5/25 05:23, Max Chernoff via ntg-context wrote:
Hi Pablo,
Hi Max, many thanks for your reply.
Not quite what you're asking for, but the "compare" script does something fairly similar: [...] The source for that script is in
tex/context/base/mkiv/mtx-context-compare.tex
so maybe you can put together something similar from there?
I’m afraid not. I used "compare" in the past, but I need to mark additions and deletions, not to see differences imposing one file over the other one. Sorry, but when I need to see how a document has been modified, diffpdf presents a clearer overview to me. Many thanks for your help, Pablo
participants (4)
-
Bruce Horrocks
-
Hans Hagen
-
Max Chernoff
-
Pablo Rodriguez