I've attempted to apply Wolfgang's subtle suggestion of using Lua to parse the input document using a regular expression via lpeg.replacer. The replacement itself works fine; however, in doing so the XML document structure is converted to text, which means that it is no longer possible to "flush" the XML for further processing as XML. The result is that any unresolved XML tags are written verbatim to the PDF:
There are two other issues with this approach. First is efficiency. Second is that the processing function would have to be called for every XML element to capture the replacement.
My original post asked about applying regex word substitution in a ConTeXt way, such as:
\definereplacement[SubstMac][ match={Mc([A-Z].*)}, replace={\Mac \\1} ]
\definereplacement[SubstPostmeridian][ match={[Pp]\\.[Mm]\\.}, replace={\cap{pm}} ]
That seems like the cleanest approach because it would work on top of XML or any other source document. Nevertheless, here is what I tried, which partially works:
\startbuffer[main]
<html>
<p>“Mr. McAnulty, I presume?”</p>
<p>Regular text. <em>Irregular text.</em></p>
</html>
\stopbuffer
\startxmlsetups xml:xhtml
\xmlsetsetup{\xmldocument}{*}{-}
\xmlsetsetup{\xmldocument}{html|p|em}{xml:*}
\stopxmlsetups
\startxmlsetups xml:html
\startdocument
\xmlflush{#1}
\stopdocument
\stopxmlsetups
\startxmlsetups xml:p
\xmlfunction{#1}{p}
\par
\stopxmlsetups
\startxmlsetups xml:em
\dontleavehmode{\em\xmlflush{#1}}
\stopxmlsetups
\startluacode
function xml.functions.p( t )
rep = { [1] = { "McAnulty", "\\Mac Anulty" } }
x = lpeg.replacer( rep ):match( tostring( xml.text( t ) ) )
buffers.assign( "p", context( x ) )
context.getbuffer{ "p" }
end
\stopluacode
\xmlregistersetup{xml:xhtml}
\def\Mac{
\newbox\MacMBox
\setbox\MacMBox\hbox{M}
\newbox\MacCBox
\setbox\MacCBox\hbox{c}
\newbox\MacKernBox
\setbox\MacKernBox\hbox{\inframed[offset=\zeropoint, width=fit]{Mc}}
\def\MacDelta{\dimexpr\wd\MacKernBox-\wd\MacMBox-\wd\MacCBox\relax}
\def\MacUWidth{\dimexpr\wd\MacCBox-.75\MacDelta\relax}
\def\MacRule{\vrule width \MacUWidth height .04em depth \zeropoint \relax}
\def\MacKern{\dimexpr\wd\MacKernBox-\wd\MacMBox-\wd\MacCBox\relax}
\def\MacHeight{\dimexpr\ht\MacMBox-\ht\MacCBox\relax}
M{
\dontleavehmode{\raisebox{\MacHeight}\hbox{c}}
\kern-1.04\MacUWidth
\MacRule
\kern.08\MacUWidth
}
}
\xmlprocessbuffer{main}{main}{}
As shown in the screen shot, this doesn't correctly handle nested XML elements.