[NTG-context] Ugly hack for multiple MSWord docs.

Bob Kerstetter bkerstetter at mac.com
Thu Jun 15 18:45:35 CEST 2006

On Jun 13, 2006, at 5:29 PM, John R. Culleton wrote:

> Frequently I find myself in the position of needing to combine
> several MSWord and/or rtf documents into a single file for either
> pdftex or Context. I have settled on this strategy.

> <snip>

> Someday there will be an elegant solution to the MSWord to
> Context problem. For now there is my ugly hack as described here.

MEMORY DISCLAIMER: In these examples none of the function names are  
really what they are in Word or VB for Word. The functions are  
available in VB for Word, but it's been some time since I've done  
this, i don't have the macros these days and don't really know the  
real names anymore. So they are just representative of the functions  

STYLE COMMENT: These methods should work even if styles are not being  
used. For example the primary heading may be Arial, 18pt, bold and  
not the Heading 1 style. That's okay because you can search for font  
attributes in Word. If the document is not consistent, well, convert  
to text and markup manually. :)


It's not particularly elegant, but I used to convert from MSWord to  
whatever by writing VB find/replace macros based on styles and  
formatting. In newer versions of Word (at least on OS X), Replace has  
a function that includes what you found, plus you can add other text.


Find: <Heading 1>        %find stuff formatted with heading 1 style

Replace: \subject{WhatItFound}       %replaces what it found and  
wraps \subject{} around it.

Because Word stores its formatting in the line feed/carriage return,  
for paragraph styles you end up with something like this:

\subject{Some TeX

So my last VB find/replace removes the carriage returns globally:

Find: ^p}
Replace: }

When done with all find/replace functions, save as text.

That's it.

Not being much of a script writer, I record the first find/replace,  
then edit the macro and duplicate the find/replace as needed.

The VB find/replace function has options for starting at the top of  
the file, replacing globally, continuing if nothing is found and that  
sort of thing.

The macro looks something like this:

Find: <Heading 1>        %find stuff formatted with heading 1 style
Replace: \subject{WhatItFound}       %replaces what it found and  
wraps \subject{} around it.

Find: <Heading 2>        %find stuff formatted with heading 2 style
Replace: \subsubject{WhatItFound}       %replaces what it found and  
wraps \subsubject{} around it.

Find: <Heading 3>        %find stuff formatted with heading 3 style
Replace: \subsubject{WhatItFound}       %replaces what it found and  
wraps \subsubsubject{} around it.

The above method uses global replacement and it's pretty zippy, for  


Another method I used before Find/Replace had the <WhatItFound>  
function was to put the found string into a variable, then use that  
variable for the replacement text, plus any TeX control sequences  
wrapped around it.

In summary:

1. Put your finds and replaces in an array:
ArrayFind(0) Heading 1; ArrayReplace(0) \subject{
ArrayFind(1) Heading 2; ArrayReplace(1) \subsubject{
ArrayFind(2) Heading 3; ArrayReplace(2) \subsubsubject{
Note the closing } is missing. It is hardcoded in the replacement code.

2. Find the first array item starting from the top of the document.  
This highlights the text in Word:
Find = $ArrayFind(n)

3. Put the highlighted text into a variable. Maybe you can even strip  
the CR's from formatted pagagraphs:
stripCarriageReturns($FoundThisStuff) = CurrentSelection

4. Put the variable and the first replace item in the Word Replace  
function. Note the hard coded closing bracket. And the CR assuming  
you stripped the CR in step 3:
Replace = $ArrayReplace(n)+$FoundThisStuff+"}"+CR

5. Repeatedly use Replace and Find Next until nothing else is found.
Replace and Find Next

6. Repeatedly find the next array item to the end of the array.
n = n + 1
Find = $ArrayFind(n)

7. Save the file as text.
FilesSaveAs using the text option

Hum. After thinking about this and typing it in, maybe I should still  
use the OLD method. It appears to be a little easier to manage. Maybe  
a lot easier.
Oh well, not a real programmer.

More information about the ntg-context mailing list