wikimedia2context: any existing solutions?
Hello, Before I start reinvinting the wheel ... I have a feeling that some people were already doing some basic wikimedia2context syntax conversion. I would like to create PDF out of some wiki pages with very limited number of used commands. I have created a simple ruby script that fetches all the contents that I want in the final PDF, all that is left to be done is conversion from wiki to tex syntax: - replace =...= with \section{...}, ==...== with \subsection{...}, ===...=== with \subsubsection{...}, ... - replace ''...'' with {\bf ...}, '''...''' with {\it ...}, '''''...''''' with {\bi ...} - all lines starting with a space should be printed verbatim - lines starting with * should be bulleted itemize - lines starting with # should be numbered itemize - some trivial replacements like > - some links: [[abc def]] should become symlinks to begining of sections with that title - [[Image:chap1-f2.jpg|frame|Figure 1.2: Cylindrical scanner]] should become \placefigure{Cylindrical scanner}{\externalfigure[chap1-f2.jpg]} - a few tables Maybe there is more, but I think that this covers the majority of contents. The solution doesn't have to be too robust and I don't care what language it is written in (I just need a printed manual and I have no problem manually tweaking the pitfals after the conversion if needed). I can start writing regular expressions, but in case that somebody has an almost-ready-to-use solution, that would be much better than doing everything from scratch. (A Lua function that would simply read in a plain wiki file would be nice, but I have never tried to gain deep understanding of "parsing" in lua.) Thanks a lot, Mojca
On Wed, Mar 30, 2011 at 04:47:07PM +0200, Mojca Miklavec wrote:
Hello,
Before I start reinvinting the wheel ... I have a feeling that some people were already doing some basic wikimedia2context syntax conversion.
I would like to create PDF out of some wiki pages with very limited number of used commands. I have created a simple ruby script that fetches all the contents that I want in the final PDF, all that is left to be done is conversion from wiki to tex syntax: - replace =...= with \section{...}, ==...== with \subsection{...}, ===...=== with \subsubsection{...}, ... - replace ''...'' with {\bf ...}, '''...''' with {\it ...}, '''''...''''' with {\bi ...} - all lines starting with a space should be printed verbatim - lines starting with * should be bulleted itemize - lines starting with # should be numbered itemize - some trivial replacements like > - some links: [[abc def]] should become symlinks to begining of sections with that title - [[Image:chap1-f2.jpg|frame|Figure 1.2: Cylindrical scanner]] should become \placefigure{Cylindrical scanner}{\externalfigure[chap1-f2.jpg]} - a few tables
If you are comfortable with writing PEG grammar (I'm not), writing a mediawiki parser for luanamark[1] might be a good choice, it has a ConTeXt writer already (and markdown parser). I pet pandoc have mediawiki support as well, so you may try it. [1] https://github.com/jgm/lunamark Regards, Khaled -- Khaled Hosny Egyptian Arab
On Wed, Mar 30, 2011 at 17:16, Khaled Hosny wrote:
On Wed, Mar 30, 2011 at 04:47:07PM +0200, Mojca Miklavec wrote:
If you are comfortable with writing PEG grammar (I'm not), writing a mediawiki parser for luanamark[1] might be a good choice, it has a ConTeXt writer already (and markdown parser).
This seems like a very reasonable solution, however it will take too long before I understand LPEG enough to write some useful code.
I pet pandoc have mediawiki support as well, so you may try it.
I started installing it, but then realized that it only supports output to mediawiki, no input. It seems like writing my own parser (a few regular expressions in language that is not lua) will probably be the fastest solution after all. Mojca
On Wed, 30 Mar 2011, Mojca Miklavec wrote:
On Wed, Mar 30, 2011 at 17:16, Khaled Hosny wrote:
On Wed, Mar 30, 2011 at 04:47:07PM +0200, Mojca Miklavec wrote:
If you are comfortable with writing PEG grammar (I'm not), writing a mediawiki parser for luanamark[1] might be a good choice, it has a ConTeXt writer already (and markdown parser).
This seems like a very reasonable solution, however it will take too long before I understand LPEG enough to write some useful code.
I pet pandoc have mediawiki support as well, so you may try it.
I started installing it, but then realized that it only supports output to mediawiki, no input.
It seems like writing my own parser (a few regular expressions in language that is not lua) will probably be the fastest solution after all.
Why not work with html output instead? It is easier to convert html to context (either using built in xml parser or pandoc) Aditya
On Wed, Mar 30, 2011 at 07:32:35PM +0200, Mojca Miklavec wrote:
On Wed, Mar 30, 2011 at 17:16, Khaled Hosny wrote:
On Wed, Mar 30, 2011 at 04:47:07PM +0200, Mojca Miklavec wrote:
If you are comfortable with writing PEG grammar (I'm not), writing a mediawiki parser for luanamark[1] might be a good choice, it has a ConTeXt writer already (and markdown parser).
This seems like a very reasonable solution, however it will take too long before I understand LPEG enough to write some useful code.
I pet pandoc have mediawiki support as well, so you may try it.
I started installing it, but then realized that it only supports output to mediawiki, no input.
It seems like writing my own parser (a few regular expressions in language that is not lua) will probably be the fastest solution after all.
There is also http://sourceforge.net/projects/wiki2tex/ but it generates LaTeX, tweaking it to generate ConTeXt should not be hard (as long as you can build it; written in C++ and requires cmake, Qt and what not, luckily it built here just fine). Regards, Khaled -- Khaled Hosny Egyptian Arab
On Wed, Mar 30, 2011 at 20:44, Khaled Hosny wrote:
There is also http://sourceforge.net/projects/wiki2tex/ but it generates LaTeX, tweaking it to generate ConTeXt should not be hard (as long as you can build it; written in C++ and requires cmake, Qt and what not, luckily it built here just fine).
This one works out pretty nice and compiles of the box (TeX code is not perfect, but all the examples compiled). The parser needs some tweaking for special cases that were not handled by the author and some latex needs to be converted to ConTeXt (minor issues), but it indeed seems nice. (The major problem is that it lacks any documentation, but that can be circumvented.) Thanks a lot. As for why I prefer wiki to html as the main source: wiki is somewhat more basic and has a bit more structure. Even if I start from HTML, I hardly have any less work. Mojca PS: now I only need to figure out how to compile the program (for which I'm trying to prepare an acceptable/printable/readable version of the manual) without crashing ... :)
Well, that seems like a great idea. But beware: as far as I know it is impossible with context to process cals tables in an html or xml document. It is possible though to process cals tables in a separate document and insert the resulting pdf. Regards, Robert Op 30 mrt 2011, om 19:32 heeft Mojca Miklavec het volgende geschreven:
On Wed, Mar 30, 2011 at 17:16, Khaled Hosny wrote:
On Wed, Mar 30, 2011 at 04:47:07PM +0200, Mojca Miklavec wrote:
If you are comfortable with writing PEG grammar (I'm not), writing a mediawiki parser for luanamark[1] might be a good choice, it has a ConTeXt writer already (and markdown parser).
This seems like a very reasonable solution, however it will take too long before I understand LPEG enough to write some useful code.
I pet pandoc have mediawiki support as well, so you may try it.
I started installing it, but then realized that it only supports output to mediawiki, no input.
It seems like writing my own parser (a few regular expressions in language that is not lua) will probably be the fastest solution after all.
Mojca ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________
participants (4)
-
Aditya Mahajan
-
Khaled Hosny
-
Mojca Miklavec
-
R. Ermers