Re: [NTG-context] ConTeXt to RTF Conversion
- xml in not an input format but (a well structured) interchange format.
XML is a very good master format from which to derive all outputs. For example, I receive wordprocessor files from academics and convert them to XML by a combination of automated processes and hand-tagging. The XML is then stored and maintained as the master version of that document, and when we need to produce a new release in HTML, PDF or eBook, a new style of PDF, a text version optimised for screen-readers, etc. we take a snapshot of the latest XML and run it through batch production processes - including ConTeXt for the typesetting side of things. That might be overkill for small projects (we're currently holding 25 million words in XML), but the principle applies no matter what size of content you have. Duncan
Le 22 sept. 05 à 12:23, Duncan Hothersall a écrit :
- xml in not an input format but (a well structured) interchange format.
XML is a very good master format from which to derive all outputs. For example, I receive wordprocessor files from academics and convert them to XML by a combination of automated processes and hand-tagging. The XML is then stored and maintained as the master version of that document, and when we need to produce a new release in HTML, PDF or eBook, a new style of PDF, a text version optimised for screen-readers, etc. we take a snapshot of the latest XML and run it through batch production processes - including ConTeXt for the typesetting side of things.
So if I understand wml, I agree that xml is a format for filtering, not a human writable format. TeX, LaTeX or conTeXt is in input langage, which should be able to be converted to the powerfull master XML format. So we need something to convert ConTeXt to XML more than something. I suppose this imply easy convertion to HTML too ! So my question was, is there any exeprience about the use of the ConTeXt module "m-tex4ht"?
That might be overkill for small projects (we're currently holding 25 million words in XML), but the principle applies no matter what size of content you have.
-- Maurice
Maurice Diamantini wrote:
So if I understand wml, I agree that xml is a format for filtering, not a human writable format. TeX, LaTeX or conTeXt is in input langage, which should be able to be converted to the powerfull master XML format.
No, sorry. This only works for extremely simple TeX code. forget about any real-world mathematics. Forget about 80% of what real-world LaTeX users type into their computers. TeX has simply never been written to be easily parsed. Besides, our actual users are way too much concerned with what their stuff looks like on their screens with their settings to bother about structured information and the like. Believe me, I have almost finished the translation of our highly structured program documentation files to some DocBook-based XML format, and I am very happy that I had decided to make this a one-time conversion with the automated process only trying to get some 95% or so correct. My experience with the new format (which is still limited, I've been working with it the last four months or so) leads me to believe that it is no more difficult to use than some TeX dialect. The only slightly awkward thing is that you have to explicitly mark all paragraphs. I don't mind, but if you do, that sort of thing can be scripted. Short summary: Define an xml format that embeds what you need at the moment. One mistake I made: I didn't go for short names, but used DocBook names. I probably should have started from XHTML, using <p>, <em>, <a> etc. Then use that format as your master and edit in this format. There are magnitudes more decent editors to help you with editing all sorts of xml than you will ever find for any TeX variant. (I know, one is sufficient, but finding one that does exactly what *you* want is much easier with more editors to choose from.)
So my question was, is there any exeprience about the use of the ConTeXt module "m-tex4ht"?
I do have experience with using tex4ht in LaTeX, which is its native setting. It is definitely much better than all the alternatives I tried, but it does have problems with formulas, it is rather difficult to teach it your new local commands and the generated HTML code is usable for exactly one thing: Rendering in a graphical browser, for us lucky ones without visual impairments. I would not dream of using this pile of mess for anything else. HTML generated by Word simply can't be worse. regards, Christopher
Dear gang,
I have followed this discussion with interest. I edit a journal myself.
Despite announcing loudly that it is TeX-friendly, the only person who
writes articles in TeX for it is, you guessed it, myself.
I know next-to-nothing about xml, so I apologize if the next question is
ignorant:
Would it be possible to define an xml format for the journal so that I
could more easily process both ConTeXt/LaTeX articles as well as the docs
and rtfs I generally receive? Is this more work than it's worth? It's a
humanities journal, so little-to-no math.
Best
Idris
On Thu, 22 Sep 2005 22:54:47 +0200, Christopher Creutzig
So if I understand wml, I agree that xml is a format for filtering, not a human writable format. TeX, LaTeX or conTeXt is in input langage, which should be able to be converted to the powerfull master XML format.
No, sorry. This only works for extremely simple TeX code. forget about any real-world mathematics. Forget about 80% of what real-world LaTeX users type into their computers. TeX has simply never been written to be easily parsed.
-- Professor Idris Samawi Hamid Department of Philosophy Colorado State University Fort Collins, CO 80523
Idris Samawi Hamid wrote:
Would it be possible to define an xml format for the journal so that I could more easily process both ConTeXt/LaTeX articles as well as the docs and rtfs I generally receive? Is this more work than it's worth? It's a humanities journal, so little-to-no math.
Math is, in my experience, the worst part of it, so you an consider yourself happy that you don't need it. The question is, what problems of the current process are you trying to improve/solve with a possible move to xml? If your most pressing problem is the variety of data formats you receive articles in, then no, xml won't help. You'd still need some way of transforming the articles to the format of your choice. That being said, XML may be a very good intermediate step from Word or rtf to ConTeXt, if only because OpenOffice has pretty advanced import filters and stores its data in a straightforward xml format that should be easy to transform, assuming you start with a sufficiently rich set of predefined formats and somehow get people to either use them (fat chance, I know) or have them be sufficiently different that you can automatically or at least semi-automatically classify the author's formatting to your presets. In really simple cases (e.g., pure prose) you may get away with accepting HTML and converting that. If your most serious problem is a variety of output formats you want to support (print/pdf, html, some eBook variants, ...), xml is a perfect technique to develop a solution. If getting lots of different encodings is a problem of yours, xml solves that nicely as well. But just for that, there are simpler and less intrusive ways. Other things xml may solve well: - archivability (although your ConTeXt files are probably no worse) - reusability: Almost everything in a file following a well-designed xml format is local and you can simply copy a (complete) block of text + markup and insert it into another file. - consistency, enforcing rules: While it is possible to enforce things like “every article must start with an abstract containing one to three paragraphs” in TeX, it is way easier in xml. - all sorts of conversions, including shuffling around or extracting data of interest Things xml won't do any magic for: - layout. You'd need to write a conversion to ConTeXt or whatever. Depending on your needs, this can be anything from trivial (say, two hours) to almost undoable (although this would mean the xml format is particularly badly designed for your journal). Both lists are certainly incomplete. I hope you will get other answers as well. regards, Christopher
participants (4)
-
Christopher Creutzig
-
Duncan Hothersall
-
Idris Samawi Hamid
-
Maurice Diamantini