On Jun 13, 2006, at 5:29 PM, John R. Culleton wrote:
Frequently I find myself in the position of needing to combine
several MSWord and/or rtf documents into a single file for either
pdftex or Context. I have settled on this strategy.
<snip>
Someday there will be an elegant solution to the MSWord to
Context problem. For now there is my ugly hack as described here.
MEMORY DISCLAIMER: In these examples none of the function names are
really what they are in Word or VB for Word. The functions are
available in VB for Word, but it's been some time since I've done
this, i don't have the macros these days and don't really know the
real names anymore. So they are just representative of the functions
available.
STYLE COMMENT: These methods should work even if styles are not being
used. For example the primary heading may be Arial, 18pt, bold and
not the Heading 1 style. That's okay because you can search for font
attributes in Word. If the document is not consistent, well, convert
to text and markup manually. :)
MORE OR LESS CURRENT EXAMPLE
It's not particularly elegant, but I used to convert from MSWord to
whatever by writing VB find/replace macros based on styles and
formatting. In newer versions of Word (at least on OS X), Replace has
a function that includes what you found, plus you can add other text.
Example:
Find: %find stuff formatted with heading 1 style
Replace: \subject{WhatItFound} %replaces what it found and
wraps \subject{} around it.
Because Word stores its formatting in the line feed/carriage return,
for paragraph styles you end up with something like this:
\subject{Some TeX
}
So my last VB find/replace removes the carriage returns globally:
Find: ^p}
Replace: }
When done with all find/replace functions, save as text.
That's it.
Not being much of a script writer, I record the first find/replace,
then edit the macro and duplicate the find/replace as needed.
The VB find/replace function has options for starting at the top of
the file, replacing globally, continuing if nothing is found and that
sort of thing.
The macro looks something like this:
Find: %find stuff formatted with heading 1 style
Replace: \subject{WhatItFound} %replaces what it found and
wraps \subject{} around it.
Find: %find stuff formatted with heading 2 style
Replace: \subsubject{WhatItFound} %replaces what it found and
wraps \subsubject{} around it.
Find: %find stuff formatted with heading 3 style
Replace: \subsubject{WhatItFound} %replaces what it found and
wraps \subsubsubject{} around it.
The above method uses global replacement and it's pretty zippy, for
Word.
ANOTHER OLDER METHOD
Another method I used before Find/Replace had the <WhatItFound>
function was to put the found string into a variable, then use that
variable for the replacement text, plus any TeX control sequences
wrapped around it.
In summary:
1. Put your finds and replaces in an array:
ArrayFind(0) Heading 1; ArrayReplace(0) \subject{
ArrayFind(1) Heading 2; ArrayReplace(1) \subsubject{
ArrayFind(2) Heading 3; ArrayReplace(2) \subsubsubject{
Note the closing } is missing. It is hardcoded in the replacement code.
2. Find the first array item starting from the top of the document.
This highlights the text in Word:
Find = $ArrayFind(n)
3. Put the highlighted text into a variable. Maybe you can even strip
the CR's from formatted pagagraphs:
stripCarriageReturns($FoundThisStuff) = CurrentSelection
4. Put the variable and the first replace item in the Word Replace
function. Note the hard coded closing bracket. And the CR assuming
you stripped the CR in step 3:
Replace = $ArrayReplace(n)+$FoundThisStuff+"}"+CR
5. Repeatedly use Replace and Find Next until nothing else is found.
Replace and Find Next
.
.
.
6. Repeatedly find the next array item to the end of the array.
n = n + 1
Find = $ArrayFind(n)
.
.
.
7. Save the file as text.
FilesSaveAs using the text option
Hum. After thinking about this and typing it in, maybe I should still
use the OLD method. It appears to be a little easier to manage. Maybe
a lot easier.
Oh well, not a real programmer.