On Mon, Dec 21, 2020 at 2:36 PM Taco Hoekwater
On 21 Dec 2020, at 14:08, Mojca Miklavec
wrote: Dear Taco,
On Mon, 21 Dec 2020 at 13:46, Taco Hoekwater wrote:
On 21 Dec 2020, at 13:16, Mojca Miklavec wrote:
My only explanation would be that perhaps "^1" is so greedy that the rest of the pattern doesn't get found. But I don't want to believe that explanation.
Which (of course) means that that is exactly what happens ;)
The ones that match are
ababbb (a (ba+bb) b) => r4 r1(r3(r5 r4) r2(r5 r5)) r5 abbbab (a (bb+ba) b) => r4 r1(r2(r5 r5) r3(r5 r4)) r5
With the ^1, in the “bb” cases the first “b” eats all three “b”s:
ababbb fails the r5 at the end
abbbab fails the first r2 already (since the second r5 therein never happens)
Is this a deliberate choice, a limitation of the grammar expressiveness, some misuse on my side that could/should/needs to be implemented in a different way, or does it count as a "bug" on the lpeg side?
For example, I wouldn't expect a regexp "b+b" to fail on "bbb" just because "b+" would eat all three "b"s at once (the regexp "b+b" in fact finds "bbb", and I would expect a less-than-totally-greedy hit with lpeg as well). Or is my reasoning wrong here?
PEGs are greedy by design, which is a consequence of the fact that PEGS do not backtrack, which goes back to the underlying assumptive rule of PEGs that there is one (and only one!) ‘correct’ way to parse the input. Allowing backtracking destroys that assumption and by doing so would complicate the system to a level that would make it comparable to PCRE (with all the associated penalties on processing speed and a much greater codebase).
greedy vs non-greedy is one of the things that I always keep in mind when I start with lpeg, and regularly I fail to apply -- because I think in the "perl regex way". Anyway, http://www.gammon.com.au/lpeg has some good lines: e.g. this one (from the lpeg site) find the pattern anywhere in the line: function anywhere (p) return lpeg.P { p + 1 * lpeg.V(1) } end print (lpeg.match (anywhere ("dog"), target)) -- luigi