Re: [NTG-context] DOC/RTF to ConTeXt via XML
Question: Is it possible to design a doc or rtf template that Open Office can convert to a sane, consistent xml format?
OpenOffice.org does allow you to attach an XSLT stylesheet to an export process which therefore allows you to do a (limited) transformation from the visual markup which is its native format to a more structured one which you would need. But the biggest challenge is that all wordprocessors are designed for visual editing, meaning that there are, for example, 15 or so different ways to get a bulleted list in Word, creating 15 or so different RTF constructs, and coping with this can be a nightmare.
If the Tremblay approach is rich enough, that would solve a lot of problems! Here is my idea:
1. Give each author a doc/rtf template for formatting their article; 2. Use OpenOffice to convert to xml; 3. Use the Tremblay method (have not tried it yet) to process this in Context.
The FO approach (Paul Tremblay's focus) is one way to process XML to paginated output, but there are many others. Personally I don't like the FO approach, for a variety of reasons, but I'm sure others have had success with it. But you should also explore DocBook-in-ConTeXt, which uses ConTeXt's native XML processing capabilities. And don't rule out using a separate scripting language to convert XML into ConTeXt as a batch process, since that will give you the ultimate flexibility in accessing all of ConTeXt's abilities.
Question: Does the entire journal have to be in programmed in xml or can ConTeXt process xml locally? For example, I may have my own article done in COnTeXt mixed with other articles done in rtf=>xml.
You can just put XML into \startXMLdata ... \stopXMLdata blocks. I do this for MathML processing within a larger ConTeXt document.
Any other advice (and/or pitfalls to watch for) would be appreciated. This sounds very promising!
Horses for courses. It's possible to get sucked into things like an FO implementation or an XML conversion and find that you have spent months perfecting it and it only shaves half an hour off your production time! Also, you do tend to have to make compromises in design if you want to be able to process directly from XML. But if you have sufficient throughput and an appropriate design, it can be a real boon. Hope that helps. Duncan
Duncan Hothersall wrote:
Question: Is it possible to design a doc or rtf template that Open Office can convert to a sane, consistent xml format?
OpenOffice.org does allow you to attach an XSLT stylesheet to an export process which therefore allows you to do a (limited) transformation from the visual markup which is its native format to a more structured one
Why „limited“? Complicated things are just, well, a bit complicated to achieve. It is certainly possible to get a structured document from, say, an average xhtml file. I would prefer not to write that code, though. It would be rather boring and full of hard-to-read special cases.
which you would need. But the biggest challenge is that all wordprocessors are designed for visual editing, meaning that there are, for example, 15 or so different ways to get a bulleted list in Word, creating 15 or so different RTF constructs, and coping with this can be a nightmare.
Yes, it can. (Although RTF is completely unrelated to this problem, since OOo would read the Word file. And the OOo step greatly simplifies the problem, since iirc the OOo format has just one or maybe two ways of saving bulleted lists. Or were you refering to different bullets?) The stricter your rules for the authors are, the easier it is to write the required xslt program. If your authors expect to be able to write chapter headers by manually switching to a font in the range of 20 to 24 pt and adding a number in front, you've got a hell of a coding session in front of you. If, otoh, you take the dictatorical approach of telling them in advance that manual font changes (maybe apart from pseudo-italics and pseudo-bold which will be mapped to \em in the end) will simply be ignored, your code will be much easier but you may have a problem with the authors.
The FO approach (Paul Tremblay's focus) is one way to process XML to paginated output, but there are many others. Personally I don't like the FO approach, for a variety of reasons, but I'm sure others have had success with it. But you should also explore DocBook-in-ConTeXt, which uses ConTeXt's native XML processing capabilities. And don't rule out
The advantage of using DocBook is that you get a very rich set of capabilities. The disadvantage can be described in almost the same words, plus, as I said before, DocBook is one of the most verbose formats in common use. If you only use the format as an intermediate step, that is irrelevant, but if your authors willsend in files that way, it is not.
using a separate scripting language to convert XML into ConTeXt as a batch process, since that will give you the ultimate flexibility in accessing all of ConTeXt's abilities.
Personally, I'd use xslt for that. Navigating the xml tree is extremely easy and writing out text instead of xml is not really a problem.
Question: Does the entire journal have to be in programmed in xml or can ConTeXt process xml locally? For example, I may have my own article done in COnTeXt mixed with other articles done in rtf=>xml.
You can just put XML into \startXMLdata ... \stopXMLdata blocks. I do this for MathML processing within a larger ConTeXt document.
I'd approach Idris' problem the other way round: Transform the xml files to ConTeXt and leave the ConTeXt files as is. Then, texexec the whole thing.
Any other advice (and/or pitfalls to watch for) would be appreciated. This sounds very promising!
Horses for courses. It's possible to get sucked into things like an FO implementation or an XML conversion and find that you have spent months perfecting it and it only shaves half an hour off your production time!
Amen. Also, don't limit your authors to Word. Offering Word is obviously a requirement, but if you go the way through OOo, there would be no point in not offering an OOo template file. If you are using a standard xml format, such as (a subset of) DocBook or TEI, you probably should accept articles in that format, too. And, of course, ConTeXt. Christopher
participants (2)
-
Christopher Creutzig
-
Duncan Hothersall