Typesetting LibreOffice (ODT) documents with ConTeXt
I scoured the wiki and mailing-list without finding a definite answer. The most recent discussion I can find is from 2006 and at that time it was "possible" but nobody had yet developed the appropriate template, XSLT style-sheet, module or whatever to actually do it. For a number of reasons (including an absolute necessity to produce MS compatible .doc files) I need to maintain and write documents using LibreOffice Writer (or OO.org Writer) but the quality of the PDF files is, shall we say, not satisfactory. Exporting to LaTeX 2e is possible (and standard equipment in LO-W) but after using both for a while now, I vastly prefer ConTeXt. I could probably use something like the TEI tools to transform the ODT file to XHTML or TEI p5 and process that but I've found over many years such intermediate transformations have a lot of problems of their own. I don't need math support for /my/ work but I am sure others who do need it would like to follow the same route to great PDFs. Any solutions? -- Bill Meahan Westland, Michigan USA
Hi Bill,
On Wed, 30 Jan 2013 12:31:51 -0700, Bill Meahan
I scoured the wiki and mailing-list without finding a definite answer. The most recent discussion I can find is from 2006 and at that time it was "possible" but nobody had yet developed the appropriate template, XSLT style-sheet, module or whatever to actually do it.
For a number of reasons (including an absolute necessity to produce MS compatible .doc files) I need to maintain and write documents using LibreOffice Writer (or OO.org Writer) but the quality of the PDF files is, shall we say, not satisfactory. Exporting to LaTeX 2e is possible (and standard equipment in LO-W) but after using both for a while now, I vastly prefer ConTeXt. I could probably use something like the TEI tools to transform the ODT file to XHTML or TEI p5 and process that but I've found over many years such intermediate transformations have a lot of problems of their own.
I don't need math support for /my/ work but I am sure others who do need it would like to follow the same route to great PDFs.
Have you considered using markdown/pandoc? You can either 1) convert odt to markdown, then markdown to context. Or better: 2) write in markdown and convert to odt/docx or context as needed (via pandoc). ConTeXt also has a markdown mode so you can also choose to process markdown directly in mkiv. Unless your typesetting needs are really complicated, 2) may be worth checking out. For simple academic work (e.g. journal articles) destined for a Word/docx workflow this is my preferred option. Best wishes Idris -- Professor Idris Samawi Hamid Department of Philosophy Colorado State University Fort Collins, CO 80523
On 01/30/2013 02:45 PM, Idris Samawi Hamid ادريس سماوي حامد wrote:
Have you considered using markdown/pandoc? You can either
I appreciate the suggestion but it does not meet my needs. I currently use GNU Emacs and YASnippet. All my work-to-date is already in raw context which YASnippet and my snippet collection makes easier than Markdown. TEA and TextAdept and SCIte provided similar functionality. I am familiar with Markdown and don't like it in the least. Markup (whether Markdown, (X)HTML, textile, raw context or whatever) is still 1970's-style processing (we did roff/nroff/troff in those days). Part of the objective is to get away from plain-text + markup so I (and the other users) can get near-WYSIWYG processing and preview the documents as they are written. No doubt some fine-tuning would be required with a direct-to-ConTeXt method but it still has other advantages. Including making it possible for my wife to use it. :) She was an excellent social worker but her computer skills are quite wanting even after many years of my tutelage. An XSLT stylesheet would allow direct export of a document from LO-W which could then be be tweaked if necessary. -- Bill Meahan Westland, Michigan USA
On Wed, 30 Jan 2013, Bill Meahan wrote:
An XSLT stylesheet would allow direct export of a document from LO-W which could then be be tweaked if necessary.
Another option is to uncompress the odt file (IIUC, it is just a zip), and process it directly in ConTeXt (http://www.pragma-ade.com/general/manuals/xml-mkiv.pdf). This approach is more flexible than XSLT stylesheets, but it ties you to ConTeXt (with XSLT, in principle, you can switch to other formats relatively easily). In essence it boils down to understanding the ODT XML Schema and figuring out the mapping to context commands. Aditya
Am 30.01.2013 um 22:12 schrieb Aditya Mahajan
On Wed, 30 Jan 2013, Bill Meahan wrote:
An XSLT stylesheet would allow direct export of a document from LO-W which could then be be tweaked if necessary.
Another option is to uncompress the odt file (IIUC, it is just a zip), and process it directly in ConTeXt (http://www.pragma-ade.com/general/manuals/xml-mkiv.pdf).
ConTeXt is able to read from zip files.
This approach is more flexible than XSLT stylesheets, but it ties you to ConTeXt (with XSLT, in principle, you can switch to other formats relatively easily).
In essence it boils down to understanding the ODT XML Schema and figuring out the mapping to context commands.
Hans posted a simple example to process odt files a few years ago. Wolfgang
On 01/30/2013 04:21 PM, Wolfgang Schuster wrote:
ConTeXt is able to read from zip files. Hans posted a simple example to process odt files a few years ago.
Wolfgang
Hmm. Didn't show up when I used the list search. Perhaps I simply missed it, I'll look again. Thanks. -- Bill Meahan Westland, Michigan USA
On 01/30/2013 10:12 PM, Aditya Mahajan wrote:
On Wed, 30 Jan 2013, Bill Meahan wrote:
An XSLT stylesheet would allow direct export of a document from LO-W which could then be be tweaked if necessary.
Another option is to uncompress the odt file (IIUC, it is just a zip), and process it directly in ConTeXt (http://www.pragma-ade.com/general/manuals/xml-mkiv.pdf).
This approach is more flexible than XSLT stylesheets, but it ties you to ConTeXt (with XSLT, in principle, you can switch to other formats relatively easily).
In essence it boils down to understanding the ODT XML Schema and figuring out the mapping to context commands.
I am no expert here, but I have tried this approach a while ago when I was typesetting an edited volume. The authors sent me MS Word files, which I saved as OOO. But the xml in open office was just too messy to deal with. It doesn't provide logical structure, but tries to recreate the visual output, so you get dozens of different <span type="this"> and <span type="that"> elements which may be completely irrelevant. And whenever I thought I had figured out what some cryptic abbreviation (say, <span font="T6">) meant ("italic"), I then learnt that in the next document I opened, it may mean something completely different. I would be interested in finding a fully automated work flow, but I'm somewhat sceptical that it exists. And don't even think about round-trip conversion, I don't think this will be possible. Just my 2 cents. Thomas
On Wed, 30 Jan 2013 14:48:15 -0700, Thomas A. Schmitz
On 01/30/2013 10:12 PM, Aditya Mahajan wrote:
On Wed, 30 Jan 2013, Bill Meahan wrote:
An XSLT stylesheet would allow direct export of a document from LO-W which could then be be tweaked if necessary.
Another option is to uncompress the odt file (IIUC, it is just a zip), and process it directly in ConTeXt (http://www.pragma-ade.com/general/manuals/xml-mkiv.pdf).
This approach is more flexible than XSLT stylesheets, but it ties you to ConTeXt (with XSLT, in principle, you can switch to other formats relatively easily).
In essence it boils down to understanding the ODT XML Schema and figuring out the mapping to context commands.
Ah, it sounds so simple, doesn't it? :D
I am no expert here, but I have tried this approach a while ago when I was typesetting an edited volume. The authors sent me MS Word files, which I saved as OOO. But the xml in open office was just too messy to deal with. It doesn't provide logical structure, but tries to recreate the visual output, so you get dozens of different <span type="this"> and <span type="that"> elements which may be completely irrelevant. And whenever I thought I had figured out what some cryptic abbreviation (say, <span font="T6">) meant ("italic"), I then learnt that in the next document I opened, it may mean something completely different. I would be interested in finding a fully automated work flow, but I'm somewhat sceptical that it exists. And don't even think about round-trip conversion, I don't think this will be possible.
In light of years spent as the editor of an academic journal, with the corresponding pain involved in converting countless doc-file contributions to odt to context, I have to agree with Thomas. Of course Bill is apparently the author of the files he wishes to convert, so he can impose some structural discipline on his own odt work -- and perhaps teach his wife to write in the same style ;-) But in general odt is too much of a mess for my limited skills. And although Bill does not "like it in the least" I am not aware of a better cross-format solution than markdown/pandoc whenever I am forced to deal with M$-Word workflows and ConTeXt in my own writing. If I can go out on a limb: What Bill seems to want is a general WYSIWYG->ConTeXt solution. Generalizing Thomas's remark, I'm not sure that the word-processor paradigm is appropriate for such a thing (unless one is very disciplined in using the word processor). But a WYSIWYG structured layout processor like Framemaker (is there some free imitation out there?) may output xml that is more regular, predictable, and easier to map to ConTeXt than any M$-Word imitation. Best wishes Idris -- Professor Idris Samawi Hamid Department of Philosophy Colorado State University Fort Collins, CO 80523
On Wed, 30 Jan 2013, Idris Samawi Hamid ادريس سماوي حامد wrote:
If I can go out on a limb: What Bill seems to want is a general WYSIWYG->ConTeXt solution. Generalizing Thomas's remark, I'm not sure that the word-processor paradigm is appropriate for such a thing (unless one is very disciplined in using the word processor). But a WYSIWYG structured layout processor like Framemaker (is there some free imitation out there?) may output xml that is more regular, predictable, and easier to map to ConTeXt than any M$-Word imitation.
For a *simple* WYSIWYG solution, have a look at zim (http://zim-wiki.org/). It is a desktop wiki, but it has support for basic structure elements (headings, bold, italic, etc., lists, images, hyperlinks). It has a native text-based format, and exports to HTML/Markdown/ReST. So, if you do not need any fancy features (tables, footnotes, etc.), it may be a suitable WYSIWYG editor. I assume that the generated HTML is clean, and it should be easier to handle than ODT. Aditya
On 01/30/2013 05:13 PM, Idris Samawi Hamid ادريس سماوي حامد wrote:
But in general odt is too much of a mess for my limited skills. And although Bill does not "like it in the least" I am not aware of a better cross-format solution than markdown/pandoc whenever I am forced to deal with M$-Word workflows and ConTeXt in my own writing.
Everybody has their own preferences. As one-time net.god Henry Spencer put it, "The nice thing about standards is there are so many of them."
If I can go out on a limb: What Bill seems to want is a general WYSIWYG->ConTeXt solution. Generalizing Thomas's remark, I'm not sure that the word-processor paradigm is appropriate for such a thing (unless one is very disciplined in using the word processor). But a WYSIWYG structured layout processor like Framemaker (is there some free imitation out there?) may output xml that is more regular, predictable, and easier to map to ConTeXt than any M$-Word imitation.
Scribus (~InDesign) has an XML-based format, too but no direct conversion to M$-word. Doesn't look all that bad to me but I'm hardly an XML expert. At least it's free (beer and freedom). Sigil works directly on epub2 (XHTML+) but doesn't support epub3 (XHTML++) yet. TEI tools can convert odt -> XHTML, epub2 epub3 and several others including LaTeX but not ConTeXt. How successfully is another question. I write fiction with an occasional stab at poetry (mostly as part of a fictional work) not academic papers so my considerations are somewhat different. The content and theme often require different typography and formatting on an individual book basis. (See Bringhurst) Sadly (and I really mean that) there are a couple of ebook publishers who /insist/ on submissions being in M$-Word format and then they will do the conversions to mobi, epub, fb2 and pdf themselves even if I can do a better job. Plus, most of my writer friends work in word processors which means that it is far easier to exchange manuscripts for proofing & feedback is via the (ugh) .doc file.
Best wishes Idris
-- Bill Meahan Westland, Michigan USA
On Wed, 30 Jan 2013 18:54:49 -0500
Bill Meahan
Plus, most of my writer friends work in word processors which means that it is far easier to exchange manuscripts for proofing & feedback is via the (ugh) .doc file.
I have been able to teach some of my collaborators to exchange plain text. They mostly use MSWord as their editor. After one or two round trips in plain text format (.txt for them), they eventually learn to focus on content and forget about format. This is pretty easy with utf8 and ConTeXt as it is mostly readable text. One constraint, though, is to keep paragraphs to one very long line with no \n or \r. This is not a problem for me as I simply configure my editor to wrap its view (not the file). Alan
Am 2013-01-31 um 00:54 schrieb Bill Meahan:
Scribus (~InDesign) has an XML-based format, too but no direct conversion to M$-word. Doesn't look all that bad to me but I'm hardly an XML expert.
Some 10 years ago I was looking for a XML based layout format to use as exchange standard for newspaper ads between a web-based editor and other layout/workflow tools. I looked at Scribus - at that time a nearly undocumented mess. Maybe it’s better now.
At least it's free (beer and freedom). Sigil works directly on epub2 (XHTML+) but doesn't support epub3 (XHTML++) yet. TEI tools can convert odt -> XHTML, epub2 epub3 and several others including LaTeX but not ConTeXt. How successfully is another question.
(X)HTML is also (used, even if not planned as) view-based, not structurally meaningful, so you'd need a limited and defined "subset" of HTML to make meaningful TeX code from it - not very different from word processor usage. It *is* possible to use MS Word with proper styles and structure... Greetlings, Hraban --- http://www.fiee.net/texnique/ http://wiki.contextgarden.net https://www.cacert.org (I'm an assurer)
On 30/01/13 20:45, Idris Samawi Hamid ادريس سماوي حامد wrote:
Hi Bill,
On Wed, 30 Jan 2013 12:31:51 -0700, Bill Meahan
wrote: I scoured the wiki and mailing-list without finding a definite answer. The most recent discussion I can find is from 2006 and at that time it was "possible" but nobody had yet developed the appropriate template, XSLT style-sheet, module or whatever to actually do it.
For a number of reasons (including an absolute necessity to produce MS compatible .doc files) I need to maintain and write documents using LibreOffice Writer (or OO.org Writer) but the quality of the PDF files is, shall we say, not satisfactory. Exporting to LaTeX 2e is possible (and standard equipment in LO-W) but after using both for a while now, I vastly prefer ConTeXt. I could probably use something like the TEI tools to transform the ODT file to XHTML or TEI p5 and process that but I've found over many years such intermediate transformations have a lot of problems of their own.
I don't need math support for /my/ work but I am sure others who do need it would like to follow the same route to great PDFs.
Have you considered using markdown/pandoc? You can either
pandoc would be the perfect tool for this purpose (one [extended markdown] source to generate them all), but it has some shortcomings. The most important limitation is that it doesn't allow language tagging. This isn't a problem if you don't mix languages (or you don't mind them wrong hyphenated). Another important issue is that pandoc is not able to mark blocks (or text spans) with identifiers or classes. In my opinion, these are the two most important issues that render pandoc a less-than-perfect tool to generate documents in different formats from a single source. Just in case it helps, Pablo -- http://www.ousia.tk
Hi Bill,
I will jump in here after I have been following this thread.
There is a more direct method that you can use though at first it requires some work.
Then again, it might not work if the formatting used is quite complex.
A long while ago I had to join several Word documents to form a book and output as
a pdf for the publisher.
As usual with all collaborative work in acedemica, nobody followed the guide lines.
Word choked on putting such a large document together. A real mess! So, I decided
to convert every thing to LaTeX. I wrote a few Word macros that converted the
quotation marks to commands, converted the footnotes to LaTeX commands,
translate the öäß, etc to LATeX, and the Word-formating to LaTeX- Commands
environments of my own liking (names).
Saved the documents as standard text file. In LaTeX I set up the environments as I
needed then.
This work flow work quite well.
regards
Keith.
Am 30.01.2013 um 20:31 schrieb Bill Meahan
I scoured the wiki and mailing-list without finding a definite answer. The most recent discussion I can find is from 2006 and at that time it was "possible" but nobody had yet developed the appropriate template, XSLT style-sheet, module or whatever to actually do it.
For a number of reasons (including an absolute necessity to produce MS compatible .doc files) I need to maintain and write documents using LibreOffice Writer (or OO.org Writer) but the quality of the PDF files is, shall we say, not satisfactory. Exporting to LaTeX 2e is possible (and standard equipment in LO-W) but after using both for a while now, I vastly prefer ConTeXt. I could probably use something like the TEI tools to transform the ODT file to XHTML or TEI p5 and process that but I've found over many years such intermediate transformations have a lot of problems of their own.
I don't need math support for /my/ work but I am sure others who do need it would like to follow the same route to great PDFs.
Any solutions?
On 01/31/2013 04:19 AM, Keith J. Schultz wrote:
Hi Bill,
I will jump in here after I have been following this thread.
Saved the documents as standard text file. In LaTeX I set up the environments as I needed then.
This work flow work quite well.
regards Keith.
Yeah, if all else fails I can always write a few Perl scripts. :) Before I retired, I wrote a LOT of Perl code that supplemented a huge corporate application in real time. Perl may not be the favored language these days but it gets the job done. :) I also edit (and report for and ...) an organizational newsletter. I use Scribus for that. Thanks for the input. -- Bill Meahan Westland, Michigan USA
Aditya and Idris were sufficiently strong in their recommendation to use Markdown+pandoc for multi-format document production (including ConTeXt) I decided to take another look. Facepalm! I suddenly realized the custom formatting I thought I would lose is simply a matter of creating CSS and ConTeXt environment files to specify the formatting and styling. I already use environment files according to the ConTeXt Project Structuring and translating that to CSS is more a matter of spending some time doing it rather than complexity. Let's go for two facepalms. What made the difference is a little editor, written entirely in Python so it is cross-platform, called ReText. It is less powerful than Emacs but has the advantage of almost-real-time preview of what the produced document will look like in plain HTML. It's not quite WYSIWYG but it _is_ the next best thing and satisfies my needs. You can find ReText at http://sourceforge.net/projects/retext/ As I said, it (should) work for Windows and Mac users but you must have WebKit installed for the preview feature. Mac users have it by default since Safari uses it. Actually, Apple wrote it. Of course, you need Python installed as well. Thanks again! -- Bill Meahan Westland, Michigan USA
On Sat, 9 Feb 2013, Bill Meahan wrote:
Aditya and Idris were sufficiently strong in their recommendation to use Markdown+pandoc for multi-format document production (including ConTeXt) I decided to take another look.
Sooner or later, you'll reach the limit of markdown. In those situations, I use gpp to preprocess the file. See http://randomdeterminism.wordpress.com/2012/06/01/how-i-stopped-worring-and-... Aditya
On 02/09/2013 06:44 PM, Aditya Mahajan wrote:
Sooner or later, you'll reach the limit of markdown. In those situations, I use gpp to preprocess the file. See
http://randomdeterminism.wordpress.com/2012/06/01/how-i-stopped-worring-and-...
Aditya
I saw that early on and knowing that was possible helped me decide I /would/ use pandoc :) Sort of looks like reStructuredText and its "roles" may help quite a bit in mapping to CSS definitions and custom style definitions in ConTeXt environment files. That may well be all I need for my particular situation(s). Since pandoc can use either one, ReText can be set to use rst instead of markdown and the fact markdown & rst are quite similar, I'm expecting to use that instead of markdown /per se/. Pandoc 1.10 supports EPUB3 directly. Nice! -- Bill Meahan Westland, Michigan USA
On Sat, 09 Feb 2013 09:51:46 -0700, Bill Meahan
What made the difference is a little editor, written entirely in Python so it is cross-platform, called ReText. It is less powerful than Emacs but has the advantage of almost-real-time preview of what the produced document will look like in plain HTML. It's not quite WYSIWYG but it _is_ the next best thing and satisfies my needs. You can find ReText at http://sourceforge.net/projects/retext/ As I said, it (should) work for Windows and Mac users but you must have WebKit installed for the preview feature. Mac users have it by default since Safari uses it. Actually, Apple wrote it. Of course, you need Python installed as well.
I never heard of retext before, so I spent some time with it... thanks for the reference! Unicode support seems solid, even bidi (via Qt). OTOH it's waaay too geeky for the average citizen to install -- too many steps (python, pyqt,...), have to search for your .conf file through a python interpreter and you still can't find it after that etc. Then in Windows it turns out to be an .ini file, not a .conf file! Then you have to make a .bat file to start it with a mouse click, etc. Now you previously mentioned working with colleagues etc. For Windows users MarkdownPad -- http://markdownpad.com/ -- has most of the same features as ReText (e.g. custom css) and is trivial to install. Best wishes Idris -- Professor Idris Samawi Hamid Department of Philosophy Colorado State University Fort Collins, CO 80523
On 02/10/2013 12:07 PM, Idris Samawi Hamid ادريس سماوي حامد wrote:
I never heard of retext before, so I spent some time with it... thanks for the reference! Unicode support seems solid, even bidi (via Qt).
OTOH it's waaay too geeky for the average citizen to install -- too many steps (python, pyqt,...), have to search for your .conf file through a python interpreter and you still can't find it after that etc. Then in Windows it turns out to be an .ini file, not a .conf file! Then you have to make a .bat file to start it with a mouse click, etc.
I'm spoiled by Linux: prompt$ sudo apt-get install retext installs ReText and all dependencies.//If you must have a GUI, Synaptic or even Ubuntu Software Center make it pretty easy by hiding the apt infrastructure. They will still auto-load all dependencies. I like GUI stuff but I've been using computers from before there even was a command line so command lines are not scary for /me/. Yes, I'm that old./ Once in a while/ Linux is easier to use than Windows. ;-) Glad you found something that works well. -- Bill Meahan Westland, Michigan USA
On Sun, 10 Feb 2013 11:47:22 -0700, Bill Meahan
Glad you found something that works well.
Retext seems to have better support than MarkdownPad for some pandoc markdown extensions like footnotes: ======================== Here is a footnote reference,[^1] and another.[^longnote] [^1]: Here is the footnote. [^longnote]: Here's one with multiple blocks. Subsequent paragraphs are indented to show that they belong to the previous footnote. { some.code } The whole paragraph can be indented, or just the first line. In this way, multi-paragraph footnotes work like multi-paragraph list items. This paragraph won't be part of the note, because it isn't indented. ======================== In the html/live-preview Retext gives you a jumpback from the footnote to the original line... very nice. Best wishes Idris -- Professor Idris Samawi Hamid Department of Philosophy Colorado State University Fort Collins, CO 80523
participants (9)
-
Aditya Mahajan
-
Alan BRASLAU
-
Bill Meahan
-
Henning Hraban Ramm
-
Idris Samawi Hamid ادريس سماوي حامد
-
Keith J. Schultz
-
Pablo Rodríguez
-
Thomas A. Schmitz
-
Wolfgang Schuster