> On 21 Dec 2020, at 14:08, Mojca Miklavec <mojca.miklavec.lists@gmail.com> wrote:
>
> Dear Taco,
>
> On Mon, 21 Dec 2020 at 13:46, Taco Hoekwater wrote:
>>> On 21 Dec 2020, at 13:16, Mojca Miklavec wrote:
>>>
>>> My only explanation would be that perhaps "^1" is so greedy that the
>>> rest of the pattern doesn't get found. But I don't want to believe
>>> that explanation.
>>
>> Which (of course) means that that is exactly what happens ;)
>>
>> The ones that match are
>>
>> ababbb (a (ba+bb) b) => r4 r1(r3(r5 r4) r2(r5 r5)) r5
>> abbbab (a (bb+ba) b) => r4 r1(r2(r5 r5) r3(r5 r4)) r5
>>
>> With the ^1, in the “bb” cases the first “b” eats all three “b”s:
>>
>> ababbb fails the r5 at the end
>>
>> abbbab fails the first r2 already (since the second r5 therein never happens)
>
> Is this a deliberate choice, a limitation of the grammar
> expressiveness, some misuse on my side that could/should/needs to be
> implemented in a different way, or does it count as a "bug" on the
> lpeg side?
>
> For example, I wouldn't expect a regexp "b+b" to fail on "bbb" just
> because "b+" would eat all three "b"s at once (the regexp "b+b" in
> fact finds "bbb", and I would expect a less-than-totally-greedy hit
> with lpeg as well). Or is my reasoning wrong here?
PEGs are greedy by design, which is a consequence of the fact that PEGS do not backtrack, which goes back to the underlying assumptive rule of PEGs that there is one (and only one!) ‘correct’ way to parse the input. Allowing backtracking destroys that assumption and by doing so would complicate the system to a level that would make it comparable to PCRE (with all the associated penalties on processing speed and a much greater codebase).
greedy vs non-greedy is one of the things that I always keep in mind when I start with lpeg, and regularly I fail to apply -- because I think in the "perl regex way".
Anyway,
has some good lines:
e.g. this one (from the lpeg site) find the pattern anywhere in the line:
function anywhere (p)
return lpeg.P { p + 1 * lpeg.V(1) }
end
print (lpeg.match (anywhere ("dog"), target))
--
luigi