[NTG-context] DOC/RTF to ConTeXt via XML
dh at capdm.com
Wed Sep 28 10:54:39 CEST 2005
> No need for rtf. That would loose lots of information anyway, wouldn't it?
RTF can capture everything that .doc can (MS update it every time they
rev the .doc format), and it has the advantage that it is defined in a
spec with a grammar, which means that importing routines (like the one
in OO.o) tend to be better than for the binary .doc format. So I would
usually use .rtf as the Save As... from Word, rather than relying on
OO.o's reverse engineering of the .doc format. Others' experiences may
vary, of course, and perhaps I do an injustice to OO.o's Word imports,
which have certainly improved. But RTF is a fairly safe bet, and
additionally it is 'human readable' so that helps debugging.
>>converting open office xml is not always easy; stay away from tab's and use
>>high level constructs as much as possible
I would add to this - make sure you use either OO.o 1.1.5 or a 2.0 Beta,
since earlier versions used a file format which was a lot trickier to
post-process (problems with conflating styles into paragraph formats).
>>Once I get a sane xml file (this seems to be the biggest problem) what is the
>>best tool to convert this to ConTeXt?
Well you might not need to - remember that ConTeXt can process XML
natively now, which is why I suggested you look at the
DocBook-in-ConTeXt project, which uses this feature. You wouldn't
necessarily have to use the DocBook standard, but you could use the
principles of that project to define a nice output from your own
(simple) brand of XML.
More information about the ntg-context