Improving pandoc's ConTeXt output
Hi all, I'm currently trying to improve the ConTeXt output generated by pandoc, the document convert. There are two questions that we haven't decided on, and I'd be grateful to receive some feedback from ConTeXt experts on these issues: 1. As far as I understand, the `\section` syntax currently produced by pandoc should be considered mkii legacy syntax. We're likely going to switch to the modern `\startsection`/`\stopsection` syntax instead. Are there any concerns about retiring the old syntax? (Side note: pandoc already produces the new syntax, but only when called with `--section-divs`.) 2. Similarly, I'd like to start wrapping paragraphs with `\startparagraph`/`\stopparagraph`. It is important for me to get properly tagged PDF, but this would also make the output more verbose. Is that something that you would find bothering, or do you see adding the extra environment by default as an acceptable practice? Of course, I'd also be happy to get other suggestions on how to improve pandoc's ConTeXt support. Thanks in advance, Albert -- Albert Krewinkel GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
Dear Albert, great to hear that, as my cooperative uses pandoc to produce print stuff with ConTeXt, we are always happy, when there are improvements. Am Sun, Jun 05, 2022 at 09:07:37AM +0200 schrieb Albert Krewinkel via ntg-context:
2. Similarly, I'd like to start wrapping paragraphs with `\startparagraph`/`\stopparagraph`. It is important for me to get properly tagged PDF, but this would also make the output more verbose. Is that something that you would find bothering, or do you see adding the extra environment by default as an acceptable practice?
A similiary question came up in the org-mode mailinglist some weeks ago. (Background: Org-mode is a markup used with Emacs to make single source publishing possible) Someone who maintains an export programme for ConTeXt uses sectionlevels. You get the incremental subsections and subsubsection like this. \startsectionlevel \startsectionlevel \startsectionlevel \stopsectionlevel \stopsectionlevel \stopsectionlevel This makes it possible to ignore the part-chapter-section naming convention and be more flexible. You can leave it to the style files to decide which level is a part, chapter, section etc... As I do not use this in production – only playing around with emacs and org-mode – I cannot say if this is a good way. Have you considered sectionlevel? What is your opinion? juh -- Autoren-Homepage: ......... http://literatur.hasecke.com Satiren & Essays: ......... http://www.sudelbuch.de Privater Blog: ............ http://www.hasecke.eu Netzliteratur-Projekt: .... http://www.generationenprojekt.de
-----Ursprüngliche Nachricht----- Von: ntg-context
Im Auftrag von juh via ntg- context Gesendet: Sonntag, 5. Juni 2022 10:20 An: ntg-context@ntg.nl Cc: juh Betreff: Re: [NTG-context] Improving pandoc's ConTeXt output Dear Albert,
great to hear that, as my cooperative uses pandoc to produce print stuff with ConTeXt, we are always happy, when there are improvements.
Am Sun, Jun 05, 2022 at 09:07:37AM +0200 schrieb Albert Krewinkel via ntg- context:
2. Similarly, I'd like to start wrapping paragraphs with `\startparagraph`/`\stopparagraph`. It is important for me to get properly tagged PDF, but this would also make the output more verbose. Is that something that you would find bothering, or do you see adding the extra environment by default as an acceptable practice?
A similiary question came up in the org-mode mailinglist some weeks ago.
(Background: Org-mode is a markup used with Emacs to make single source publishing possible)
Someone who maintains an export programme for ConTeXt uses sectionlevels.
You get the incremental subsections and subsubsection like this.
\startsectionlevel \startsectionlevel \startsectionlevel \stopsectionlevel \stopsectionlevel \stopsectionlevel
This makes it possible to ignore the part-chapter-section naming convention and be more flexible. You can leave it to the style files to decide which level is a part, chapter, section etc...
As I do not use this in production – only playing around with emacs and org- mode – I cannot say if this is a good way.
Have you considered sectionlevel? What is your opinion?
Yes, I've also suggested that: https://github.com/jgm/pandoc/issues/5539 Denis
Dear juh,
juh via ntg-context
great to hear that, as my cooperative uses pandoc to produce print stuff with ConTeXt, we are always happy, when there are improvements.
I'd love to learn more about your workflow, if you have time to share at some point! We chose ConTeXt as the intermediate format for the next iteration of the JOSS pipeline (the current pipeline is described here: https://www.ncbi.nlm.nih.gov/books/NBK579698/) Big thanks to Denis Maier for convincing me to try and go that route :)
Am Sun, Jun 05, 2022 at 09:07:37AM +0200 schrieb Albert Krewinkel via ntg-context:
2. Similarly, I'd like to start wrapping paragraphs with `\startparagraph`/`\stopparagraph`. It is important for me to get properly tagged PDF, but this would also make the output more verbose. Is that something that you would find bothering, or do you see adding the extra environment by default as an acceptable practice?
A similiary question came up in the org-mode mailinglist some weeks ago.
(Background: Org-mode is a markup used with Emacs to make single source publishing possible)
Someone who maintains an export programme for ConTeXt uses sectionlevels.
You get the incremental subsections and subsubsection like this.
\startsectionlevel \startsectionlevel \startsectionlevel \stopsectionlevel \stopsectionlevel \stopsectionlevel
This makes it possible to ignore the part-chapter-section naming convention and be more flexible. You can leave it to the style files to decide which level is a part, chapter, section etc...
As I do not use this in production – only playing around with emacs and org-mode – I cannot say if this is a good way.
Have you considered sectionlevel? What is your opinion?
I absolutely see the appeal of that method, and I'd like for pandoc to support it. The main question is probably whether the default should be `\startsection` or `\startsectionlevel`. I lean towards making `\startsectionlevel` the new default, but that might need some more discussing. I think Denis linked to the respective GitHub issue in his mail, more comments and opinions are definitely welcome. -- Albert Krewinkel GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
Dear Albert! Am 05.06.22 um 16:15 schrieb Albert Krewinkel via ntg-context:
Dear juh,
juh via ntg-context
writes: great to hear that, as my cooperative uses pandoc to produce print stuff with ConTeXt, we are always happy, when there are improvements.
I'd love to learn more about your workflow, if you have time to share at some point!
I would like to blog about it, but we are still tweaking the process. In short. Our authors use Markdown: - to produce websites via Hugo - to produce print stuff via pandoc-context Our main concern was the question: How to roll out the context system to all contributors? We maintain our system in a repo with a bootstrap script. The bootstrap script determines the architecture of the computer, installs lmtx in the right architecture (linux,mac,windows) and generates a post-merge script for git that calls "context --generate" after every update of the repo. The repo contains context stuff and pandoc stuff in two directories. In ../context-project we have our style files and global images such as logos, icons etc. In the pandoc directory we store our pandoc templates, filters, csl-styles and global bibliografies. Last but not least the repo contains a build script that essentially calls pandoc and context on the given markdown file. For each print project we have a different repository, eg. for reports, brochures, slides or offers. We make use of the yaml preface in the markdown files especially for offers. The sales consultant only has to fill out the yaml part and build the file. The result is nice looking and standardized so that everyone in the coop could write an offer. One dependency is that everyone must install the newest pandooc and Inkscape to convert SVGs to PDF. With the MP-way of doing this we had some problems. We need the SVG conversion for our documentation. It is special as we would like to build the html version also with pandoc from one single markdown source. In theory this is possible but there are yet some smaller problems to solve. I am happy to answer questions for details. juh -- Hostsharing Presseteam | https://www.hostsharing.net Telefon: +49 40 2093313-51 | Telefax: +49 40 2093313-52 Hostsharing eG | Flughafenstraße 52a | 22335 Hamburg | Germany Registergericht Hamburg, GnR 1007 | UID: DE218602793 Vorstand: Michael Hierweck, Dr. Martin Weigele
Hi folks, I would like to blog about it, but we are still tweaking the process.
I put together a multipart series about the process of going from Markdown to Pandoc to ConTeXt: https://dave.autonoma.ca/blog/2019/05/22/typesetting-markdown-part-1/ To speed up the "write > typeset > review" process, I developed KeenWrite: https://github.com/DaveJarvis/keenwrite Behind the scenes, KeenWrite uses a Java library similar to pandoc with some additions, such as the ability to use pandoc's annotation syntax. The biggest issue for rolling this out is providing users a way to easily install ConTeXt in a cross-platform manner. At the moment, trying to export from KeenWrite without a local ConTeXt install simply directs the user to download and install the most appropriate version for their system. We make use of the yaml preface in the markdown files especially for
offers. The sales consultant only has to fill out the yaml part and
KeenWrite goes a bit beyond this to completely separate YAML variables from the Markdown files. The text editor also provides a hierarchical editor for YAML trees along with the ability to reference those YAML variables when building. This can be accomplished from the user preferences or the command-line: https://github.com/DaveJarvis/keenwrite/blob/master/docs/cmd.md What's more is that the variables can be inserted into documents, isolating duplicated information to a single location: the externalized YAML data. Great for templating. One of the reasons I wrote KeenWrite was so that I could simplify the use of variables within documents. In the Typesetting Markdown series, the build script essentially performs: 1. pandoc document-vars.md + vars.yaml > document-final.md 2. pandoc document-final.md > document.tex 3. context document.tex > document.pdf Using KeenWrite, this process becomes: keenwrite --input document-vars.md --theme=boschet --variables vars.yaml --output document.pdf This ends up converting an annotated Markdown file into XML then uses ConTeXt to typeset the XML using a particular theme.KeenWrite has a number of themes, some basic, some advanced: https://github.com/DaveJarvis/keenwrite-themes/ This allows me to eliminate the dependency on both Pandoc and Inkscape. I've also encountered some problems with SVG to MP, but Hans is usually quick to fix the bugs given a minimal working example that pinpoints the problem. Either way, it's possible to retain the Inkscape step by telling ConTeXt not to use the MP conversion, as you alluded to, Juh. There are other handy features built into KeenWrite. For example, it's possible to separate chapters into individual files. As long as they are named something natural (ch1.md, ch2.md, a_chap.md, b_chap.md), they'll get collated in the correct order. From there, Control+P will export to PDF for the current file and Control+Shift+P will combine all chapters into a single PDF. Then there's the F12 button that captures errors and output from ConTeXt. If you check it out, let me know what you think! Cheers!
Am Sun, Jun 05, 2022 at 04:52:52PM -0700 schrieb Thangalin:
This allows me to eliminate the dependency on both Pandoc and Inkscape. I've also encountered some problems with SVG to MP, but Hans is usually quick to fix the bugs given a minimal working example that pinpoints the problem. Either way, it's possible to retain the Inkscape step by telling ConTeXt not to use the MP conversion, as you alluded to, Juh.
So true. So here is a mwe, which works with inkscape and not with mp. As some svg works with mp, I guess that mp cannot handle all svg dialects. But this is only a guess. juh -- Autoren-Homepage: ......... http://literatur.hasecke.com Satiren & Essays: ......... http://www.sudelbuch.de Privater Blog: ............ http://www.hasecke.eu Netzliteratur-Projekt: .... http://www.generationenprojekt.de
Am 06.06.22 um 01:52 schrieb Thangalin:
I put together a multipart series about the process of going from Markdown to Pandoc to ConTeXt:
https://dave.autonoma.ca/blog/2019/05/22/typesetting-markdown-part-1/ https://dave.autonoma.ca/blog/2019/05/22/typesetting-markdown-part-1/
This was one source of inspiration to me. juh
-----Ursprüngliche Nachricht----- Von: ntg-context
Im Auftrag von Albert Krewinkel via ntg-context Gesendet: Sonntag, 5. Juni 2022 09:08 An: ntg-context@ntg.nl Cc: Albert Krewinkel Betreff: [NTG-context] Improving pandoc's ConTeXt output Hi all,
I'm currently trying to improve the ConTeXt output generated by pandoc, the document convert. There are two questions that we haven't decided on, and I'd be grateful to receive some feedback from ConTeXt experts on these issues:
[...]
2. Similarly, I'd like to start wrapping paragraphs with `\startparagraph`/`\stopparagraph`. It is important for me to get properly tagged PDF, but this would also make the output more verbose. Is that something that you would find bothering, or do you see adding the extra environment by default as an acceptable practice?
My initial impulse was that this would be good when going directly to PDF, but that it may be disturbing (too verbose) when you intend to edit the context output. See also: https://github.com/jgm/pandoc/pull/7885 Denis
On 6/5/2022 9:07 AM, Albert Krewinkel via ntg-context wrote:
Hi all,
I'm currently trying to improve the ConTeXt output generated by pandoc, the document convert. There are two questions that we haven't decided on, and I'd be grateful to receive some feedback from ConTeXt experts on these issues:
1. As far as I understand, the `\section` syntax currently produced by pandoc should be considered mkii legacy syntax. We're likely going to switch to the modern `\startsection`/`\stopsection` syntax instead. Are there any concerns about retiring the old syntax? (Side note: pandoc already produces the new syntax, but only when called with `--section-divs`.)
We keep compatibility as much as possible although there are of course exceptions (like: we don't need font encodings in mkiv and we hav eopentype fonts there). So it si quit elikely that these commands will stay forever.
2. Similarly, I'd like to start wrapping paragraphs with `\startparagraph`/`\stopparagraph`. It is important for me to get properly tagged PDF, but this would also make the output more verbose. Is that something that you would find bothering, or do you see adding the extra environment by default as an acceptable practice?
I assume not that many users see (or manipulate) the output so it is harmless.
Of course, I'd also be happy to get other suggestions on how to improve pandoc's ConTeXt support.
Thanks in advance,
Albert
-- ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
Hans Hagen via ntg-context schrieb am 05.06.2022 um 12:01:
On 6/5/2022 9:07 AM, Albert Krewinkel via ntg-context wrote:
Hi all,
I'm currently trying to improve the ConTeXt output generated by pandoc, the document convert. There are two questions that we haven't decided on, and I'd be grateful to receive some feedback from ConTeXt experts on these issues:
2. Similarly, I'd like to start wrapping paragraphs with `\startparagraph`/`\stopparagraph`. It is important for me to get properly tagged PDF, but this would also make the output more verbose. Is that something that you would find bothering, or do you see adding the extra environment by default as an acceptable practice?
I assume not that many users see (or manipulate) the output so it is harmless.
\startparagraph can lead to unwanted side effects (I can't remember the details) and \bpar ... \epar is the safer alternative to add tags. Wolfgang
Wolfgang Schuster via ntg-context
Hans Hagen via ntg-context schrieb am 05.06.2022 um 12:01:
On 6/5/2022 9:07 AM, Albert Krewinkel via ntg-context wrote:
Hi all,
I'm currently trying to improve the ConTeXt output generated by pandoc, the document convert. There are two questions that we haven't decided on, and I'd be grateful to receive some feedback from ConTeXt experts on these issues:
2. Similarly, I'd like to start wrapping paragraphs with `\startparagraph`/`\stopparagraph`. It is important for me to get properly tagged PDF, but this would also make the output more verbose. Is that something that you would find bothering, or do you see adding the extra environment by default as an acceptable practice?
I assume not that many users see (or manipulate) the output so it is harmless.
\startparagraph can lead to unwanted side effects (I can't remember the details) and \bpar ... \epar is the safer alternative to add tags.
Thank you Wolfgang, I wasn't aware! Searching the wiki brought me to https://wiki.contextgarden.net/Epub_Sample. It states:
In places where \startparagraph does not work, such as itemizations, where it causes a blank line after the bullet and before the item text, use \bpar (and closing \epar) to tag paragraphs.
It's probably more consistent then to use \bpar ... \epar everywhere. -- Albert Krewinkel GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
Am 05.06.22 um 16:30 schrieb Albert Krewinkel via ntg-context:
2. Similarly, I'd like to start wrapping paragraphs with `\startparagraph`/`\stopparagraph`. It is important for me to get properly tagged PDF,
\startparagraph can lead to unwanted side effects (I can't remember the details) and \bpar ... \epar is the safer alternative to add tags.
Thank you Wolfgang, I wasn't aware! Searching the wiki brought me to https://wiki.contextgarden.net/Epub_Sample. It states:
In places where \startparagraph does not work, such as itemizations, where it causes a blank line after the bullet and before the item text, use \bpar (and closing \epar) to tag paragraphs.
It's probably more consistent then to use \bpar ... \epar everywhere.
If I understood Hans right, \start/stopparagraph is not necessary any more to get properly tagged XML. I didn’t check it yet myself – my DOCX-to-ConTeXt converter still produces this markup. One case where it’s probably in the way is if you need settings like \looseness – before \startparagraph is too early, and after it is too late (in LMTX), you need \updateparagraphproperties in such cases, but you can possible avoid that without \startparagraph – need to check... Hraban
Albert Krewinkel via ntg-context
I'm currently trying to improve the ConTeXt output generated by pandoc, the document convert. There are two questions that we haven't decided on, and I'd be grateful to receive some feedback from ConTeXt experts on these issues:
1. As far as I understand, the `\section` syntax currently produced by pandoc should be considered mkii legacy syntax. We're likely going to switch to the modern `\startsection`/`\stopsection` syntax instead. Are there any concerns about retiring the old syntax? (Side note: pandoc already produces the new syntax, but only when called with `--section-divs`.)
2. Similarly, I'd like to start wrapping paragraphs with `\startparagraph`/`\stopparagraph`. It is important for me to get properly tagged PDF, but this would also make the output more verbose. Is that something that you would find bothering, or do you see adding the extra environment by default as an acceptable practice?
Of course, I'd also be happy to get other suggestions on how to improve pandoc's ConTeXt support.
A big "thank you" for everyone's feedback! We've implemented the first point as suggested, with the second still in discussion. The next pandoc version will also feature better table support for ConTeXt. Thanks again, ConTeXt is a wonderful tool! Best, Albert PS: For completeness, this is the pull requests for better table support: https://github.com/jgm/pandoc/pull/8116 -- Albert Krewinkel GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
participants (9)
-
Albert Krewinkel
-
denis.maier@unibe.ch
-
Hans Hagen
-
Henning Hraban Ramm
-
Jan U. Hasecke
-
juh
-
juh+ntg-context@mailbox.org
-
Thangalin
-
Wolfgang Schuster