I have created 22 MSword files mainly text with embedded PNG diagrams. I am seeking to have these converted to Context and to redo the (simple) diagrams. I was thinking of using MetaPost. I am seeking to pay a typesetter to assist me in this task. I would be willing to pay for your time or for the time of a colleague you could direct me to. I am seeking to have typeset a book. The MSWord files have already been professionally proof-read, and the subject matter is political economy from the 18th century and how this translates to and may contradict conventional wisdom within the field of modern economics. Other typesetters I have contacted have suggested running pandoc to create a LaTeX file and use Tikz for the diagrams. But I have spent 3 days reading on advancements in the ConText side of the industry which I suspect is far more advanced. I would prefer to use better tools as a longer term solution. Can you advise me? David Roderick.
Am 11.10.24 um 18:19 schrieb David Roderick:
I have created 22 MSword files mainly text with embedded PNG diagrams. I am seeking to have these converted to Context and to redo the (simple) diagrams. I was thinking of using MetaPost. I am seeking to pay a typesetter to assist me in this task. I would be willing to pay for your time or for the time of a colleague you could direct me to. I am seeking to have typeset a book. The MSWord files have already been professionally proof-read, and the subject matter is political economy from the 18th century and how this translates to and may contradict conventional wisdom within the field of modern economics.
Other typesetters I have contacted have suggested running pandoc to create a LaTeX file and use Tikz for the diagrams. But I have spent 3 days reading on advancements in the ConText side of the industry which I suspect is far more advanced. I would prefer to use better tools as a longer term solution. Can you advise me?
Hi David, I would take your order, but you can also try it yourself with my tools: https://codeberg.org/fiee/context-tools/src/branch/master/docx2ctx Pandoc can also generate ConTeXt code; try which you like better. Best, Hraban www.fiee.net
On 10/11/2024 10:16 PM, Hraban Ramm wrote:
Am 11.10.24 um 18:19 schrieb David Roderick:
I have created 22 MSword files mainly text with embedded PNG diagrams. I am seeking to have these converted to Context and to redo the (simple) diagrams. I was thinking of using MetaPost. I am seeking to pay a typesetter to assist me in this task. I would be willing to pay for your time or for the time of a colleague you could direct me to. I am seeking to have typeset a book. The MSWord files have already been professionally proof-read, and the subject matter is political economy from the 18th century and how this translates to and may contradict conventional wisdom within the field of modern economics.
Other typesetters I have contacted have suggested running pandoc to create a LaTeX file and use Tikz for the diagrams. But I have spent 3 days reading on advancements in the ConText side of the industry which I suspect is far more advanced. I would prefer to use better tools as a longer term solution. Can you advise me?
Hi David,
I would take your order, but you can also try it yourself with my tools: https://codeberg.org/fiee/context-tools/src/branch/master/docx2ctx
Pandoc can also generate ConTeXt code; try which you like better.
I'd never go that route when, as mentioned, a high quality document is needed which can involve control (for which one can e.g. use instances for specific components). Coding in TeX speak is not that bad. Also, one is more likely to get answers for specific problems on the list. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
Hi Hans, Are you suggesting that both of Hraban's options you would not go for, or just the latter (pandoc)? What would you recommend to take a MSWord file and produce a ConText document? Should I be thinking of doing all the ConText markups manually? David
On 10/13/2024 2:19 PM, angel.of.north@gmail.com wrote:
Hi Hans, Are you suggesting that both of Hraban's options you would not go for, or just the latter (pandoc)? What would you recommend to take a MSWord file and produce a ConText document? Should I be thinking of doing all the ConText markups manually?
It depends on the document. You could do an initial conversions "a la Rhaban" but from then on work in "tex lingua". Once you are accustomed to it, it's not that hard. I admit that I never had to work in Word or something other than TeX (before that ascii with simple markup a bit like markdown but that was ages ago before there was markdown so i wrote a parser and made sure all was formatted well for a daisy wheel line printer and/or laser printer). Maybe just look around abit on the documentation (source) tree to see how larger documents look in tex code. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
Am 12.10.24 um 19:59 schrieb Hans Hagen via ntg-context:
On 10/11/2024 10:16 PM, Hraban Ramm wrote:
Am 11.10.24 um 18:19 schrieb David Roderick:
I have created 22 MSword files mainly text with embedded PNG diagrams. I am seeking to have these converted to Context and to redo the (simple) diagrams. I was thinking of using MetaPost. I am seeking to pay a typesetter to assist me in this task. I would be willing to pay for your time or for the time of a colleague you could direct me to. I am seeking to have typeset a book. The MSWord files have already been professionally proof-read, and the subject matter is political economy from the 18th century and how this translates to and may contradict conventional wisdom within the field of modern economics.
Other typesetters I have contacted have suggested running pandoc to create a LaTeX file and use Tikz for the diagrams. But I have spent 3 days reading on advancements in the ConText side of the industry which I suspect is far more advanced. I would prefer to use better tools as a longer term solution. Can you advise me?
Hi David,
I would take your order, but you can also try it yourself with my tools: https://codeberg.org/fiee/context-tools/src/branch/master/docx2ctx
Pandoc can also generate ConTeXt code; try which you like better.
I'd never go that route when, as mentioned, a high quality document is needed which can involve control (for which one can e.g. use instances for specific components). Coding in TeX speak is not that bad. Also, one is more likely to get answers for specific problems on the list.
I don’t suggest using DOC(X) to directly create PDF via Pandoc and ConTeXt, just converting DOC(X) to ConTeXt for further processing. The direct approach works only with simple and/or very consistently styled documents. For my youth novel, I’m writing in LibreOffice, convert ODS to DOCX, convert that to ConTeXt with my script and finalize the code with an additional script, so that it doesn’t need further tweaking. This works only because I control the source and the converter. My script is not trying to keep the layout, just the structure and some formatting. It can extract embedded images (again, since I fixed the bug) and creates an \externalfigure, but that usually isn’t enough. Similar situation with embedded tables (the conversion is not good, I’ll probably enhance it a bit as soon as I need it). Hraban
I think that there is a bug in your docx2ctx.py. It works for files which do not have embedded pngs in them. But for files which do: Traceback (most recent call last): File "/home/dmr104/Downloads/context-tools/docx2ctx/docx2ctx.py", line 872, in <module> process_doc(Path(doc), copy.copy(args)) File "/home/dmr104/Downloads/context-tools/docx2ctx/docx2ctx.py", line 673, in process_doc result = obj.process() ^^^^^^^^^^^^^ File "/home/dmr104/Downloads/context-tools/docx2ctx/docx2ctx.py", line 515, in process dst_fname = self.options['imagedir'] / pname.name ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~ TypeError: unsupported operand type(s) for /: 'str' and 'str' 2024-10-12 23:21:10,979 DEBUG a_graphic ['xmlns:a'] 2024-10-12 23:21:10,979 DEBUG pic_cNvPr ['id', 'name'] 2024-10-12 23:21:10,980 DEBUG a_blip ['r:embed'] 2024-10-12 23:21:10,980 DEBUG image reference found: rId7 = image2.png 2024-10-12 23:21:10,980 DEBUG p ['w14:paraId', 'w14:textId', 'w:rsidR', 'w:rsidRDefault', 'w:rsidP'] 2024-10-12 23:21:10,980 DEBUG pStyle ['w:val'] 2024-10-12 23:21:10,980 DEBUG p ['w14:paraId', 'w14:textId', 'w:rsidR', 'w:rsidRDefault', 'w:rsidP'] 2024-10-12 23:21:10,980 DEBUG pStyle ['w:val'] 2024-10-12 23:21:10,980 DEBUG p ['w14:paraId', 'w14:textId', 'w:rsidR', 'w:rsidRDefault', 'w:rsidP'] 2024-10-12 23:21:10,981 DEBUG pStyle ['w:val'] 2024-10-12 23:21:10,981 DEBUG p ['w14:paraId', 'w14:textId', 'w:rsidR', 'w:rsidRDefault', 'w:rsidP'] 2024-10-12 23:21:10,981 DEBUG pStyle ['w:val'] 2024-10-12 23:21:10,981 DEBUG p ['w14:paraId', 'w14:textId', 'w:rsidR', 'w:rsidRDefault', 'w:rsidP'] 2024-10-12 23:21:10,981 DEBUG pStyle ['w:val'] 2024-10-12 23:21:10,981 DEBUG Not an image file: [Content_Types].xml 2024-10-12 23:21:10,982 DEBUG Not an image file: _rels/.rels 2024-10-12 23:21:10,982 DEBUG Not an image file: word/document.xml 2024-10-12 23:21:10,982 DEBUG Not an image file: word/_rels/document.xml.rels 2024-10-12 23:21:10,982 DEBUG Not an image file: word/footnotes.xml 2024-10-12 23:21:10,982 DEBUG Not an image file: word/endnotes.xml
Am 13.10.24 um 00:32 schrieb angel.of.north@gmail.com:
I think that there is a bug in your docx2ctx.py. It works for files which do not have embedded pngs in them. But for files which do: TypeError: unsupported operand type(s) for /: 'str' and 'str'
I forgot a pathlib.Path, it’s fixed now. Apparently I never had embedded images since I changed that code… Hraban
Thank you for fixing this. I espy another feature which is required. I save the png's into a subdirectory called images_0007 which is a subdirectory of where my tex files are located. I run context from the latter (main or lower) working directory. Your script looks for the png's in this lower (main) directory: \startplacefigure[location=here,reference=Picture 1178610960,title={}]% rId6 \externalfigure[image1.png] \stopplacefigure should be: \startplacefigure[location=here,reference=Picture 1178610960,title={}]% rId6 \externalfigure[images_00007/image1.png][hfactor=fit] \stopplacefigure If you could fix this it would be greatly appreciated, and methinks useful. David Roderick
Am 14.10.24 um 13:12 schrieb David Roderick:
Thank you for fixing this. I espy another feature which is required. I save the png's into a subdirectory called images_0007 which is a subdirectory of where my tex files are located. I run context from the latter (main or lower) working directory. Your script looks for the png's in this lower (main) directory:
\startplacefigure[location=here,reference=Picture 1178610960,title={}]% rId6 \externalfigure[image1.png] \stopplacefigure
should be: \startplacefigure[location=here,reference=Picture 1178610960,title={}]% rId6 \externalfigure[images_00007/image1.png][hfactor=fit] \stopplacefigure
If you could fix this it would be greatly appreciated, and methinks useful.
You’re right, that makes sense. I added the path, but not the width setting. I never used extracted images unchanged; usually the result of the script is just raw material for me, so I can live with imperfections. Perhaps it would make sense to use adaptable templates for all constructs (paragraph, itemization, table, image…); for me, the main missing feature is splitting by chapter. This little script won’t become something like Pandoc. Hraban
David Roderick schrieb am 14.10.2024 um 13:12:
Thank you for fixing this. I espy another feature which is required. I save the png's into a subdirectory called images_0007 which is a subdirectory of where my tex files are located. I run context from the latter (main or lower) working directory. Your script looks for the png's in this lower (main) directory:
\startplacefigure[location=here,reference=Picture 1178610960,title={}]% rId6 \externalfigure[image1.png] \stopplacefigure
should be: \startplacefigure[location=here,reference=Picture 1178610960,title={}]% rId6 \externalfigure[images_00007/image1.png][hfactor=fit] \stopplacefigure
If you could fix this it would be greatly appreciated, and methinks useful.
You don't have to include the subdirectory in \externalfigure when you set the search path at the begin of the document with \setupexternalfigures[directory=images_00007] Even the file extension can be left out. %%%% begin example \setupexternalfigures[directory=images_00007] \starttext \startplacefigure[reference=Picture 1178610960] \externalfigure[image1][hfactor=fit] \stopplacefigure \stoptext %%%% end example Wolfgang
I have 22 image_000?? directories and file names start as image1, image2 in each one, so even if this path permitted globbing, or regexps, the filenames would clobber each other. should image1 be in image_0006 or image_00019 directory, for example? Incidentally the definition is: \setupexternalfigures [...,...] [..,..=..,..] 1 NAME OPT 2 inherits: \setupexternalfigure Does this mean \setupexternalfigure (single) inherits from \setupexternalfigures (plural)? Yes or no? To say \setupexternalfigures (plural) inherits \setupfigure (single) means that the child (\setupexternalfigures --plural) inherits from the parent (\setupexternalfigure -- singular). Is this a grammatical mistake? e.g. if I inherit 10^6 Euros from my father than I as the benefactor inherit what the deceased has bequeathed to me. Here, my father would here be the plural and I the singular: I would "inherit from" my father, and I would inherit 10^ Euros. The way this stands as it is written means that my father inherits me: which is that I am inherited by my father. Should the definition actually be (?): \setupexternalfigures [...,...] [..,..=..,..] 1 NAME OPT 2 inherited by: \setupexternalfigure David Roderick
Am 14.10.24 um 18:17 schrieb Wolfgang Schuster:
You don't have to include the subdirectory in \externalfigure when you set the search path at the begin of the document with Even the file extension can be left out.
I know, but for the generated code it makes sense to include both, IMO. Hraban
participants (7)
-
angel.of.north@gmail.com
-
David Roderick
-
Denis Maier
-
Hans Hagen
-
Henning Hraban Ramm
-
Hraban Ramm
-
Wolfgang Schuster