[NTG-context] String substitution using regular expressions and backreferences

Hans Hagen j.hagen at freedom.nl
Fri Aug 26 09:34:08 CEST 2022


On 8/25/2022 9:44 PM, Thangalin via ntg-context wrote:
> I've attempted to apply Wolfgang's subtle suggestion of using Lua to parse
> the input document using a regular expression via lpeg.replacer. The
> replacement itself works fine; however, in doing so the XML document
> structure is converted to text, which means that it is no longer possible
> to "flush" the XML for further processing as XML. The result is that any
> unresolved XML tags are written verbatim to the PDF:
> 
> https://i.stack.imgur.com/9ZFND.png
> 
> There are two other issues with this approach. First is efficiency. Second
> is that the processing function would have to be called for every XML
> element to capture the replacement.
> 
> My original post asked about applying regex word substitution in a ConTeXt
> way, such as:
> 
> \definereplacement[SubstMac][ match={Mc([A-Z].*)}, replace={\Mac \\1} ]
> \definereplacement[SubstPostmeridian][ match={[Pp]\\.[Mm]\\.},
> replace={\cap{pm}} ]
> 
> That seems like the cleanest approach because it would work on top of XML
> or any other source document. Nevertheless, here is what I tried, which
> partially works:
> 
> \startbuffer[main]
> <html>
>    <p>“Mr. McAnulty, I presume?”</p>
>    <p>Regular text. <em>Irregular text.</em></p>
> </html>\stopbuffer
> \startxmlsetups xml:xhtml
>    \xmlsetsetup{\xmldocument}{*}{-}
>    \xmlsetsetup{\xmldocument}{html|p|em}{xml:*}\stopxmlsetups
> \startxmlsetups xml:html
>    \startdocument
>      \xmlflush{#1}
>    \stopdocument\stopxmlsetups
> % Paragraphs are followed by a paragraph break, but only if not
> nested.\startxmlsetups xml:p
>    \xmlfunction{#1}{p}
>    \par\stopxmlsetups
> \startxmlsetups xml:em
>    \dontleavehmode{\em\xmlflush{#1}}\stopxmlsetups
> \startluacode
> function xml.functions.p( t )
>    rep = { [1] = { "McAnulty", "\\Mac Anulty" } }
>    x = lpeg.replacer( rep ):match( tostring( xml.text( t ) ) )
> 
>    buffers.assign( "p", context( x ) )
>    context.getbuffer{ "p" }
> end\stopluacode
> \xmlregistersetup{xml:xhtml}
> \def\Mac{%
>    % Determine the sizes of 'M' and 'c'.
>    \newbox\MacMBox%
>    \setbox\MacMBox\hbox{M}%
>    \newbox\MacCBox%
>    \setbox\MacCBox\hbox{c}%
>    %
>    % Cheat to dynamically derive the kerning size by putting Mc in a box.
>    %
>    \newbox\MacKernBox%
>    \setbox\MacKernBox\hbox{\inframed[offset=\zeropoint, width=fit]{Mc}}%
>    \def\MacDelta{\dimexpr\wd\MacKernBox-\wd\MacMBox-\wd\MacCBox\relax}%
>    \def\MacUWidth{\dimexpr\wd\MacCBox-.75\MacDelta\relax}%
>    \def\MacRule{\vrule width \MacUWidth height .04em depth \zeropoint \relax}%
>    \def\MacKern{\dimexpr\wd\MacKernBox-\wd\MacMBox-\wd\MacCBox\relax}%
>    \def\MacHeight{\dimexpr\ht\MacMBox-\ht\MacCBox\relax}%
>    %
>    % Write Mc, where c has a macron, to the document.
>    %
>    M{%
>      \dontleavehmode{\raisebox{\MacHeight}\hbox{c}}%
>      \kern-1.04\MacUWidth
>      \MacRule
>      \kern.08\MacUWidth
>    }%
> }%
> \xmlprocessbuffer{main}{main}{}
> 
> As shown in the screen shot, this doesn't correctly handle nested XML
> elements.
> 
> Any ideas on what approach to take to perform a string replacement in
> ConTeXt?
Best stay at the xml end ...

\startbuffer[main]
<html>
   <p>“Mr. McAnulty, I presume?”</p>
   <p>Regular text. <em>Irregular text.</em></p>
</html>
\stopbuffer

\startxmlsetups xml:xhtml
   \xmlsetsetup{\xmldocument}{*}{-}
   \xmlsetsetup{\xmldocument}{html|p|em}{xml:*}
\stopxmlsetups

\startxmlsetups xml:html
     \xmlflush{#1}
\stopxmlsetups

\startxmlsetups xml:p
     \xmlfunction{#1}{p}
     \xmlcontext{#1}
     \par
\stopxmlsetups

\startxmlsetups xml:em
   \dontleavehmode{\em\xmlflush{#1}}
\stopxmlsetups

\startluacode
     local rep = lpeg.replacer { [1] = { "McAnulty", "\\Mac Anulty" } }
     function xml.functions.p(t)
         local dt = t.dt
         for i=1,#dt do
             local di = dt[i]
             if type(di) == "string" then
                 dt[i] = lpeg.match(rep,di)
             end
         end
     end
\stopluacode

\xmlregistersetup{xml:xhtml}

\startdocument
     \xmlprocessbuffer{main}{main}{}
\stopdocument

But this is more fun and probably also more reliable:

\startbuffer[main]
<html>
   <p>“Mr. McAnulty, I presume?”</p>
   <p>Regular text. <em>Irregular text.</em></p>
</html>
\stopbuffer

\startxmlsetups xml:xhtml
   \xmlsetsetup{\xmldocument}{*}{-}
   \xmlsetsetup{\xmldocument}{html|p|em}{xml:*}
\stopxmlsetups

\startxmlsetups xml:html
     \xmlflush{#1}
\stopxmlsetups

\startxmlsetups xml:p
     \xmlcontext{#1}
     \par
\stopxmlsetups

\startxmlsetups xml:em
   \dontleavehmode{\em\xmlflush{#1}}
\stopxmlsetups

\xmlregistersetup{xml:xhtml}

\usemodule[gimmicks] % in latest uploads

\chardef\MacAnulty = \getprivateglyphslot{MacAnulty}

\startsetups [box:mcanulty:\number\MacAnulty]
     \Mac Anulty
\stopsetups

\registerboxglyph category {mcanulty} unicode \MacAnulty \relax

\startluacode
     fonts.handlers.otf.addfeature {
         name    = "mcanulty",
         type    = "ligature",
         nocheck = true,
         data    = {
             [fonts.constructors.privateslots.MacAnulty] = {
                 "M", "c", "A", "n", "u", "l", "t", "y",
             },
         }
     }
\stopluacode

\definefontfeature[default][default][box=mcanulty,mcanulty=yes]

\startdocument
     \xmlprocessbuffer{main}{main}{}
\stopdocument

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------


More information about the ntg-context mailing list