[NTG-context] Off-topic: Struggles with LPEG grammar
luigi scarso
luigi.scarso at gmail.com
Mon Dec 21 14:47:05 CET 2020
On Mon, Dec 21, 2020 at 2:36 PM Taco Hoekwater <taco at elvenkind.com> wrote:
>
>
> > On 21 Dec 2020, at 14:08, Mojca Miklavec <mojca.miklavec.lists at gmail.com>
> wrote:
> >
> > Dear Taco,
> >
> > On Mon, 21 Dec 2020 at 13:46, Taco Hoekwater wrote:
> >>> On 21 Dec 2020, at 13:16, Mojca Miklavec wrote:
> >>>
> >>> My only explanation would be that perhaps "^1" is so greedy that the
> >>> rest of the pattern doesn't get found. But I don't want to believe
> >>> that explanation.
> >>
> >> Which (of course) means that that is exactly what happens ;)
> >>
> >> The ones that match are
> >>
> >> ababbb (a (ba+bb) b) => r4 r1(r3(r5 r4) r2(r5 r5)) r5
> >> abbbab (a (bb+ba) b) => r4 r1(r2(r5 r5) r3(r5 r4)) r5
> >>
> >> With the ^1, in the “bb” cases the first “b” eats all three “b”s:
> >>
> >> ababbb fails the r5 at the end
> >>
> >> abbbab fails the first r2 already (since the second r5 therein never
> happens)
> >
> > Is this a deliberate choice, a limitation of the grammar
> > expressiveness, some misuse on my side that could/should/needs to be
> > implemented in a different way, or does it count as a "bug" on the
> > lpeg side?
> >
> > For example, I wouldn't expect a regexp "b+b" to fail on "bbb" just
> > because "b+" would eat all three "b"s at once (the regexp "b+b" in
> > fact finds "bbb", and I would expect a less-than-totally-greedy hit
> > with lpeg as well). Or is my reasoning wrong here?
>
> PEGs are greedy by design, which is a consequence of the fact that PEGS do
> not backtrack, which goes back to the underlying assumptive rule of PEGs
> that there is one (and only one!) ‘correct’ way to parse the input.
> Allowing backtracking destroys that assumption and by doing so would
> complicate the system to a level that would make it comparable to PCRE
> (with all the associated penalties on processing speed and a much greater
> codebase).
>
greedy vs non-greedy is one of the things that I always keep in mind when I
start with lpeg, and regularly I fail to apply -- because I think in the
"perl regex way".
Anyway,
http://www.gammon.com.au/lpeg
has some good lines:
e.g. this one (from the lpeg site) find the pattern anywhere in the line:
function anywhere (p)
return lpeg.P { p + 1 * lpeg.V(1) }
end
print (lpeg.match (anywhere ("dog"), target))
--
luigi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.ntg.nl/pipermail/ntg-context/attachments/20201221/b4b0f18d/attachment.htm>
More information about the ntg-context
mailing list