Dear Taco, On Mon, 21 Dec 2020 at 13:46, Taco Hoekwater wrote:
On 21 Dec 2020, at 13:16, Mojca Miklavec wrote:
My only explanation would be that perhaps "^1" is so greedy that the rest of the pattern doesn't get found. But I don't want to believe that explanation.
Which (of course) means that that is exactly what happens ;)
The ones that match are
ababbb (a (ba+bb) b) => r4 r1(r3(r5 r4) r2(r5 r5)) r5 abbbab (a (bb+ba) b) => r4 r1(r2(r5 r5) r3(r5 r4)) r5
With the ^1, in the “bb” cases the first “b” eats all three “b”s:
ababbb fails the r5 at the end
abbbab fails the first r2 already (since the second r5 therein never happens)
Is this a deliberate choice, a limitation of the grammar expressiveness, some misuse on my side that could/should/needs to be implemented in a different way, or does it count as a "bug" on the lpeg side? For example, I wouldn't expect a regexp "b+b" to fail on "bbb" just because "b+" would eat all three "b"s at once (the regexp "b+b" in fact finds "bbb", and I would expect a less-than-totally-greedy hit with lpeg as well). Or is my reasoning wrong here? It certainly works if I use lpeg.P('b') + lpeg.P('bb') + lpeg.P('bbb') -- and a couple more (as long as I can predict the maximum length) but that's not really a viable workaround in general. Thank you, Mojca PS: sorry, a tiny bug also crippled into my sample code. The line after matching the 'parser1' should have used 'total1' rather than 'total': if lpeg.match(parser1, s) then total1 = total1 + 1 end