On Tue, 12 Feb 2008, Hans Hagen wrote:
Thomas A. Schmitz wrote:
Hi Aditya,
now I need to apologize for being so slow to reply: thanks a lot, this looks really fascinating! I don't know how many things I've read on the web to understand if regexps can handle nested delimiters or not (I think the long and short of it was that on some mathematical principle it just isn't possible); there is some pretty obscure perl stuff that might be able to do it but is highly experimental. If gema really can do this, it should be a godsend for processing TeX files. I have it installed now on my OS X box (but couldn't build the gel binary) and am looking forward to experimenting with it.
lua's pattern matcher can hanle nested {} (syntax: %b{} and such)
more clever things can be done with lpeg, bla {bla\{bla} and such
if you're up to date you may try
mtxrun --script check sometexfile.tex
this is a (for the mooment simple) syntax checker i wrote a while ago which shows the principles
I have been looking at different ways to parse TeX syntax since I occassionally do ConTeXt -> LaTeX conversion. Things like gema and regexs are ok for small things: e.g., convert ConTeXt section commands to LaTeX section commnads, convert figures, etc. Gema is better if you also want to convert ConTeXt font commands to LaTeX; since it is easier to write nested conversions. However, both fail miserably if you want to convert things like ConTeXt multi-line math statements to LaTeX. For that a real parser is needed. I have looked at Parsec (and Pandoc project) in Haskell, but have not made too much progress there. Maybe lpeg is an easier to understand parser. (But I sometimes get the feeling that the whole thing will be easier in TeX, since TeX already parses itself :) Aditya