Ugly hack for multiple MSWord docs.
Frequently I find myself in the position of needing to combine several MSWord and/or rtf documents into a single file for either pdftex or Context. I have settled on this strategy. 1. If necessary I convert the documents to rtf with Open Ofice Writer. 2. I convert the resulting rtf documents to LaTeX using rtf2latex2e. 3. I need to rename some of the LaTeX commands to their plain TeX or Context equivalents, and simply ignore others. Instead of editing each and every occurrence, I add the following to my "macros.tex" file which heads up the document: ---------------------------------------------------- \def\documentclass{} \def\newcommand{} \def\usepackage{} \def\tab{} \def\hspace{} \def\begin{} \def\end{} \def\textbf#1{\bf #1} \def\nobreakspace{~} \def\underline{} \def\newpage{} \def\textmd#1{\rm #1} \def\textit#1{\it #1} \def\large{\tfb} \def\reg{\rm\char174\ } \def\textregistered{\reg} ------------------------------------------------------ I create a master file that calls in each of the .tex files and compile the whole goulash. If I missed a latex tag then I add it to my \defs shown above and recompile until I get a clean run. Now I have a readable pdf file and can start correcting the format. The scattered Latex tags give me hints where centering etc. might be needed even though the tags are inoperative in Context, thanks to my nullifying \def statements shown above. Someday there will be an elegant solution to the MSWord to Context problem. For now there is my ugly hack as described here. -- John Culleton
John R. Culleton wrote:
Someday there will be an elegant solution to the MSWord to Context problem. For now there is my ugly hack as described here.
maybe the word xml output, since that can be parsed Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Thursday 15 June 2006 08:50, Hans Hagen wrote:
John R. Culleton wrote:
Someday there will be an elegant solution to the MSWord to Context problem. For now there is my ugly hack as described here.
maybe the word xml output, since that can be parsed
Hans Interesting suggestion. I don't have a copy of MSWord. And my clients are naive so that asking them to save in exotic formats is likely to be unproductive.
Open Office does not save as xml. Abiword, however does. In a simplistic test case (Now is the time for all good men.) Abiword saved the document as xml with a little coaxing and texexec compiled it clean. So at least there is something there to experiment with. Next I will try a real MSWord document, save it as xml from Abiword, and see what Context does with it. One question: How do I mix in the necessary Context commands such as papersize, font selection etc.? What are the rules and no-nos for blending Context commands into an xml document? -- John Culleton Books with answers to marketing and publishing questions: http://wexfordpress.com/tex/shortlist.pdf Book coaches, consultants and packagers: http://wexfordpress.com/tex/packagers.pdf
John R. Culleton wrote:
On Thursday 15 June 2006 08:50, Hans Hagen wrote:
John R. Culleton wrote:
Someday there will be an elegant solution to the MSWord to Context problem. For now there is my ugly hack as described here.
maybe the word xml output, since that can be parsed
Hans
Interesting suggestion. I don't have a copy of MSWord. And my clients are naive so that asking them to save in exotic formats is likely to be unproductive.
Open Office does not save as xml. Abiword, however does. In a
hm, open offices uses xml as storage format, just save in oo format and unzip the file and you will end up with xml files (however, the xml is typical office xml, complete with tab elements that spoil the idea)
One question: How do I mix in the necessary Context commands such as papersize, font selection etc.? What are the rules and no-nos for blending Context commands into an xml document?
just set up a style Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Thursday 15 June 2006 13:55, Hans Hagen wrote:
John R. Culleton wrote:
On Thursday 15 June 2006 08:50, Hans Hagen wrote:
John R. Culleton wrote:
Someday there will be an elegant solution to the MSWord to Context problem. For now there is my ugly hack as described here.
maybe the word xml output, since that can be parsed
Hans
Interesting suggestion. I don't have a copy of MSWord. And my clients are naive so that asking them to save in exotic formats is likely to be unproductive.
Open Office does not save as xml. Abiword, however does. In a
hm, open offices uses xml as storage format, just save in oo format and unzip the file and you will end up with xml files
(however, the xml is typical office xml, complete with tab elements that spoil the idea)
The abiword xml is neat and parsimonious thus: ------------------------------------------------------------------ <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"> <book> <!-- ================================================================================ --> <!-- This DocBook file was created by AbiWord. --> <!-- AbiWord is a free, Open Source word processor. --> <!-- You may obtain more information about AbiWord at www.abisource.com --> <!-- ================================================================================ --> <chapter> <title></title> <section role="unnumbered"> <title></title> <para>Now is the time for all good men.</para> </section> </chapter> </book> ------------------------------------------------ The Open Office file unzipped is a lot more verbose and a lot less readable. There are five files in fact. The file content.xml will in fact compile correctly via texexec and yield the expected result. The character count in that file alone is three times that of the corresponding Abiword xml output shown above. The experiments continue... -- John Culleton Books with answers to marketing and publishing questions: http://wexfordpress.com/tex/shortlist.pdf Book coaches, consultants and packagers: http://wexfordpress.com/tex/packagers.pdf
It's also true that "On 8 May 2006, the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) approved the OpenDocument Format (ODF) for release as ISO/IEC 26300" ODF can be an important xml format in next years. luigi
On Jun 13, 2006, at 5:29 PM, John R. Culleton wrote:
Frequently I find myself in the position of needing to combine several MSWord and/or rtf documents into a single file for either pdftex or Context. I have settled on this strategy.
<snip>
Someday there will be an elegant solution to the MSWord to Context problem. For now there is my ugly hack as described here.
MEMORY DISCLAIMER: In these examples none of the function names are
really what they are in Word or VB for Word. The functions are
available in VB for Word, but it's been some time since I've done
this, i don't have the macros these days and don't really know the
real names anymore. So they are just representative of the functions
available.
STYLE COMMENT: These methods should work even if styles are not being
used. For example the primary heading may be Arial, 18pt, bold and
not the Heading 1 style. That's okay because you can search for font
attributes in Word. If the document is not consistent, well, convert
to text and markup manually. :)
MORE OR LESS CURRENT EXAMPLE
It's not particularly elegant, but I used to convert from MSWord to
whatever by writing VB find/replace macros based on styles and
formatting. In newer versions of Word (at least on OS X), Replace has
a function that includes what you found, plus you can add other text.
Example:
Find:
participants (4)
-
Bob Kerstetter
-
Hans Hagen
-
John R. Culleton
-
luigi scarso