[NTG-context] Ugly hack for multiple MSWord docs.

John R. Culleton john at wexfordpress.com
Fri Jun 16 00:46:56 CEST 2006

On Thursday 15 June 2006 13:55, Hans Hagen wrote:
> John R. Culleton wrote:
> > On Thursday 15 June 2006 08:50, Hans Hagen wrote:
> >> John R. Culleton wrote:
> >>> Someday there will be an elegant solution to the MSWord to
> >>> Context problem. For now there is my ugly hack as described here.
> >>
> >> maybe the word xml output, since that can be parsed
> >>
> >> Hans
> >
> > Interesting suggestion. I don't have a copy of MSWord. And my
> > clients are naive so that asking them to save in exotic formats
> > is likely to be unproductive.
> >
> > Open Office does not save as xml. Abiword, however does. In a
> hm, open offices uses xml as storage format, just save in oo format and
> unzip the file and you will end up with xml files
> (however, the xml is typical office xml, complete with tab elements that
> spoil the idea)

The abiword xml is neat and parsimonious thus:


<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"

<!-- This DocBook file was created by AbiWord.										-->
<!-- AbiWord is a free, Open Source word processor.								   -->
<!-- You may obtain more information about AbiWord at www.abisource.com			   

		<section role="unnumbered">
			<para>Now is the time for all good men.</para>

The Open Office file unzipped is a lot more verbose and  a lot
less readable. There are five files in fact. The file content.xml
will in fact compile correctly via texexec and yield the expected
result. The character count in that file alone is three times
that of the corresponding Abiword xml output shown above.  

The experiments continue...
John Culleton
Books with answers to marketing and publishing questions:

Book coaches, consultants and packagers:

More information about the ntg-context mailing list