Re: [NTG-context] Off-topic: Struggles with LPEG grammar

21 Dec 2020

      Dear Taco,

On Mon, 21 Dec 2020 at 13:46, Taco Hoekwater wrote:
...
...
On 21 Dec 2020, at 13:16, Mojca Miklavec wrote:
My only explanation would be that perhaps "^1" is so greedy that the
rest of the pattern doesn't get found. But I don't want to believe
that explanation.
Which (of course) means that that is exactly what happens ;)
The ones that match are
ababbb (a (ba+bb) b) => r4 r1(r3(r5 r4) r2(r5 r5)) r5
abbbab (a (bb+ba) b) => r4 r1(r2(r5 r5) r3(r5 r4)) r5
With the ^1, in the “bb” cases the first “b” eats all three “b”s:
ababbb fails the r5 at the end
abbbab fails the first r2 already (since the second r5 therein never happens)
Is this a deliberate choice, a limitation of the grammar
expressiveness, some misuse on my side that could/should/needs to be
implemented in a different way, or does it count as a "bug" on the
lpeg side?

For example, I wouldn't expect a regexp "b+b" to fail on "bbb" just
because "b+" would eat all three "b"s at once (the regexp "b+b" in
fact finds "bbb", and I would expect a less-than-totally-greedy hit
with lpeg as well). Or is my reasoning wrong here?

It certainly works if I use
    lpeg.P('b') + lpeg.P('bb') + lpeg.P('bbb') -- and a couple more
(as long as I can predict the maximum length)
but that's not really a viable workaround in general.

Thank you,
    Mojca

PS: sorry, a tiny bug also crippled into my sample code. The line
after matching the 'parser1' should have used 'total1' rather than
'total':
    if lpeg.match(parser1, s) then
        total1 = total1 + 1
    end