Aditya Mahajan wrote:
I have been looking at different ways to parse TeX syntax since I occassionally do ConTeXt -> LaTeX conversion. Things like gema and regexs are ok for small things: e.g., convert ConTeXt section commands to LaTeX section commnads, convert figures, etc. Gema is better if you also want to convert ConTeXt font commands to LaTeX; since it is easier to write nested conversions. However, both fail miserably if you want to convert things like ConTeXt multi-line math statements to LaTeX. For that a real parser is needed. I have looked at Parsec (and Pandoc project) in Haskell, but have not made too much progress there. Maybe lpeg is an easier to understand parser. (But I sometimes get the feeling that the whole thing will be easier in TeX, since TeX already parses itself :)
This is actualy pretty easy, I did that for a TeX->XML conversion once. You have to redefine each and every command and make all special chars like $ and _ \active, but it is in fact pretty easy and fairly reliable. I would not do it like that again, these days I would use lpeg, but it was not nearly as complicated to do it in tex macros as I had anticipated. Best wishes, Taco