Dear list, I have the following sample: \startbuffer[demo] <html> <body> <div id="First"> <p>This is <span class="special">One of the best</span> a paragraph.</p> <p>This is another paragraph.</p> <p>This is another <span class="special">Two of the best</span> paragraph.</p> <p>This is another <span class="special">Three</span> paragraph.</p> <p>This is another <span class="special">Four of five</span> paragraph.</p> </div> </body> </html> \stopbuffer \startxmlsetups xml:initialize \xmlsetsetup{#1}{html}{xml:gen} \stopxmlsetups \xmlregistersetup{xml:initialize} \startxmlsetups xml:gen \xmlfilter{#1}{/**/div/command(xml:special)} \stopxmlsetups \startxmlsetups xml:special %~ \startitem \cldcontext{string.gsub(lxml.flush([[#1]]), " of the ", "")}\stopitem \stopxmlsetups \starttext \xmlprocessbuffer{main}{demo}{} \stoptext Is there any way to remove " of " and " of the " in the filtered content (xml:special)? Sorry, Lua code is crap for sure. Many thanks for your help, Pablo -- http://www.ousia.tk
There is pretty much always ‘a way’, but I do not know of a ’nice’ way. Your problem is that lxml.flush() and friends do not return a value, they just do a direct context(‘xxxx’) call behind the scenes with no return string for you to modify. Also, the special (catcode, space handling) rules for setups and \cldcontext do not help you. That does not mean it can’t be done. As I don’t know a of a nice way, here is a low-level ‘ugly' way: \startluacode function filter(a) local div = lxml.getid(a) process(div) lxml.flush(div) end function process(div) for c=1,#div.dt do if type(div.dt[c]) == 'string' then div.dt[c] = string.gsub(div.dt[c], " of the ", "") else process(div.dt[c]) end end end \stopluacode \startxmlsetups xml:special \ctxlua{filter([[#1]])} \stopxmlsetups process() is recursive because your xml:special gets the whole <div>. Not sure if you intended it that way. And if it can be done nicer, I am sure someone will correct me :) Best wishes, Taco
On 8/20/20 11:08 AM, Taco Hoekwater wrote:
Many thanks for your explanation, Taco.
You’re right, my xml:special wasn’t intended to get the whole <div>. I was tinkering with a previous sample. And I removed an \xmlfilter. Since I got no output, I didn’t see what I was missing. Many thanks for your help, Pablo -- http://www.ousia.tk
On 8/19/2020 6:10 PM, Pablo Rodriguez wrote: paragraph.</p> <p>This is another paragraph.</p> <p>This is another <span class="special">Two of the best</span> paragraph.</p> <p>This is another <span class="special">Three</span> paragraph.</p> <p>This is another <span class="special">Four of five</span> paragraph.</p> </div> <div id="Second"> <p>This is <span class="special">One of the best</span> a paragraph.</p> <p>This is another paragraph.</p> <p>This is another <span class="special">Two of the best</span> paragraph.</p> <p>This is another <span class="special">Three</span> paragraph.</p> <p>This is another <span class="special">Four of five</span> paragraph.</p> </div> </body> </html> \stopbuffer \startxmlsetups xml:initialize \xmlsetsetup{#1}{html}{xml:gen} \xmlsetsetup{#1}{span[@class='special']}{xml:span:special} \stopxmlsetups \xmlregistersetup{xml:initialize} \startxmlsetups xml:gen \startitemize \xmlfilter{#1}{/**/div/command(xml:special)} \stopitemize \stopxmlsetups \startxmlsetups xml:special \startitem <\xmlflush{#1}> \stopitem \stopxmlsetups \startxmlsetups xml:span:special (\cldcontext{(string.gsub([[\xmlraw{#1}{.}]]," of the ", ""))}) \stopxmlsetups \starttext \xmlprocessbuffer{main}{demo}{} \stoptext Or make a finalizer as Taco posted. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
On 8/20/20 11:27 AM, Hans Hagen wrote:
Many thanks for your reply, Hans. I now see that \xmlraw is the way to go. I have two questions in word replacement and Lua (maybe there is some lpeg magic that could be used). This time, I have to remove two words, such as in: string.gsub([[\xmlraw{#1}{.}]]," del ", " "):gsub(" de la ", " ")} But they could be more (and replacements might be added to that list). Is there a more elegant way than appending :gsub()? Is there also a proper way for word scanning? A "word" can be "Word ", " word " " word." " word?" (and so on). I would like to avoid having to code all combinations (of course, if this were already available). Many thanks for your help, Pablo -- http://www.ousia.tk
On 8/20/2020 12:38 PM, Pablo Rodriguez wrote: old stuff present for a long time ... probaly documented somewhere ... if not than you have to wikify it ... \starttext \replaceword[whatever][this][that] \replaceword[whatever][that][this] \startlines it is this or that {\setreplacements[whatever]it is this or that} it is this or that \stoplines \stoptext ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
On 8/20/20 1:10 PM, Hans Hagen wrote:
Many thanks for your reply, Hans. It is already wikified (https://wiki.contextgarden.net/Ligatures#Replacements). I wonder whether \replaceword could be extended to replace multiple words and also to remove them. \starttext \replaceword[whatever][this or][no] \replaceword[whatever][that][] \startlines it is this or that {\setreplacements[whatever]it is this or that} {\setreplacements[whatever]it is this or that} it is this or that \stoplines \stoptext Many thanks for your help, Pablo -- http://www.ousia.tk
\replaceword[whatever][this or][no] \replaceword[whatever][that][]
On 8/20/2020 4:20 PM, Pablo Rodriguez wrote: this feature creep is in the next upload Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
On 8/21/20 2:59 PM, Hans Hagen wrote:
Hans, many thanks for the new feature. Pablo -- http://www.ousia.tk
participants (3)
-
Hans Hagen
-
Pablo Rodriguez
-
Taco Hoekwater