Straight Quotes / Curly Quotes
I've written a Java-based lexer/parser that can convert straight quotes to curly quotes for English prose. It's a one-pass algorithm (O(n)) that uses neither look-behind nor regex. Here's a list of test cases it handles: https://raw.githubusercontent.com/DaveJarvis/keenquotes/main/lib/src/test/re... A test harness converted several Project Gutenberg texts quite well. The folks at PG may be interested in using it themselves to help convert quotes in older texts en masse. The source code is MIT-licensed: https://github.com/DaveJarvis/keenquotes/ The code should port to Lua fairly easily, should anyone be interested in adding a straight/curly quotation mark conversion module to ConTeXt. (Similar to the LaTeX package, but without using regex.) Cheers everyone!
I usually convert all kinds of quotation marks into \quotation{} / \quote{} using the regex search of my editor; a regex replacement is also part of my docx-to-ConTeXt converter script. (I see no need to avoid regexes, but YMMV.) The biggest problem I face are mixed and wrong quotation marks, e.g. English marks in a German text, a mixture of curly/straight marks, traditional LaTeX q. marks and similar mistakes. Some programs have a default of English single quotes with German double quotes :( In what kind of workflows does your program make sense? (Please don’t be offended, my view is limited.) Hraban
Am 17.06.2021 um 22:28 schrieb Thangalin
: I've written a Java-based lexer/parser that can convert straight quotes to curly quotes for English prose. It's a one-pass algorithm (O(n)) that uses neither look-behind nor regex. Here's a list of test cases it handles:
https://raw.githubusercontent.com/DaveJarvis/keenquotes/main/lib/src/test/re...
A test harness converted several Project Gutenberg texts quite well. The folks at PG may be interested in using it themselves to help convert quotes in older texts en masse. The source code is MIT-licensed:
https://github.com/DaveJarvis/keenquotes/
The code should port to Lua fairly easily, should anyone be interested in adding a straight/curly quotation mark conversion module to ConTeXt. (Similar to the LaTeX package, but without using regex.)
Cheers everyone! ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________
On 6/18/2021 12:10 AM, Henning Hraban Ramm wrote:
I usually convert all kinds of quotation marks into \quotation{} / \quote{} using the regex search of my editor; a regex replacement is also part of my docx-to-ConTeXt converter script. (I see no need to avoid regexes, but YMMV.)
The biggest problem I face are mixed and wrong quotation marks, e.g. English marks in a German text, a mixture of curly/straight marks, traditional LaTeX q. marks and similar mistakes. Some programs have a default of English single quotes with German double quotes :(
In what kind of workflows does your program make sense? (Please don’t be offended, my view is limited.) lua is normally fast enough to handle it wirh a few expresions or lpeg but in the end it depends on hwo far one will go
for instance, if it is for converting gutenberg files that extensive conversion can help ... with intermediate test runs (for instance coloring quitations quickly shows a runaway that then can be fixed in the input For instance: "Not all open quotes are closed... kind of tricky because there one needs to know the source so there is no real universal solution (one could layer it) in the past we had projects where we did the rendering and used tex but the rendering was trivial ... they came to us because we were able to turn crap into useful (it's unbelievable what can come from databases or generated from web applications, lack of symmetry, multiple escaping, bad encodings, inconsistencies) ... unfortunately the money is often already spent in getting to the stage where the crap is produced but anyway after year sone kind of knows that there is always a solution (also because tex and related tools are so flexible and can help with diagnosing) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
Hraban,
In what kind of workflows does your program make sense?
Have you looked around the web lately? KeenWrite (https://github.com/DaveJarvis/keenwrite), my plain text editor, can neither convert nor easily type curly quotes into the application. Recently, I added ConTeXt integration for exporting to PDF files. ConTeXt doesn't curl the quotes, which I found a little surprising (because LaTeX has a quote curling package). Not seeing an obvious solution, I coded my own library because all the other libraries I found were either not up to the task or required a massive natural language parser dependency. My workflow will be: Edit plain text in KeenWrite, export to XHTML, curl the quotes, run ConTeXt to typeset XHTML. Another workflow: Edit plain text in KeenWrite, export to XHTML, curl the quotes, upload to CMS. The problem is that when typewriters were invented, curly quotes didn't make it onto the popular layouts. Then, after Unicode, curly closing single quotes and curly apostrophes were not made unique. HTML entities get it right, though, with l/rdquo, l/rsquo, and apos. C'est la vie.
On 6/18/2021 4:08 AM, Thangalin wrote:
Hraban,
In what kind of workflows does your program make sense?
Have you looked around the web lately?
KeenWrite (https://github.com/DaveJarvis/keenwrite https://github.com/DaveJarvis/keenwrite), my plain text editor, can neither convert nor easily type curly quotes into the application. Recently, I added ConTeXt integration for exporting to PDF files. ConTeXt doesn't curl the quotes, which I found a little surprising (because LaTeX has a quote curling package). Not seeing an obvious solution, I coded my own library because all the other libraries I found were either not up to the task or required a massive natural language parser dependency.
My workflow will be: Edit plain text in KeenWrite, export to XHTML, curl the quotes, run ConTeXt to typeset XHTML.
Another workflow: Edit plain text in KeenWrite, export to XHTML, curl the quotes, upload to CMS.
The problem is that when typewriters were invented, curly quotes didn't make it onto the popular layouts. Then, after Unicode, curly closing single quotes and curly apostrophes were not made unique. HTML entities get it right, though, with l/rdquo, l/rsquo, and apos. C'est la vie. what do you mean with 'latex curls quotes' .. can you give an example
Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
The csquotes package can curl straight quotes: https://ctan.org/pkg/csquotes I don't know how smart its smart quote feature is, though, with respect to apostrophes.
On 6/18/2021 5:48 PM, Thangalin wrote:
The csquotes package can curl straight quotes:
https://ctan.org/pkg/csquotes https://ctan.org/pkg/csquotes
I don't know how smart its smart quote feature is, though, with respect to apostrophes.
me neither and as we always had lots of quote related stuff on board i'm also not going to explore it ... when apostrophes get translated as you do but with active characters it's no fun (ok, we still have a few in context like ~ and |) just for fun i made {\addff{primes} 123'345''\par} use primes ... in a next upload Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
Am 18.06.2021 um 04:08 schrieb Thangalin
: Hraban,
In what kind of workflows does your program make sense?
Have you looked around the web lately?
KeenWrite (https://github.com/DaveJarvis/keenwrite), my plain text editor, can neither convert nor easily type curly quotes into the application. Recently, I added ConTeXt integration for exporting to PDF files. ConTeXt doesn't curl the quotes, which I found a little surprising (because LaTeX has a quote curling package). Not seeing an obvious solution, I coded my own library because all the other libraries I found were either not up to the task or required a massive natural language parser dependency.
My workflow will be: Edit plain text in KeenWrite, export to XHTML, curl the quotes, run ConTeXt to typeset XHTML.
Another workflow: Edit plain text in KeenWrite, export to XHTML, curl the quotes, upload to CMS.
The problem is that when typewriters were invented, curly quotes didn't make it onto the popular layouts. Then, after Unicode, curly closing single quotes and curly apostrophes were not made unique. HTML entities get it right, though, with l/rdquo, l/rsquo, and apos. C'est la vie.
I’m used to type special characters with key combinations, even use my own keyboard layout to access more accented characters via dead keys. (Nothing fancy like Neo, but just extensions to Apple’s German keyboard layout.) I always wanted to port that to my Linux machine, but even the default (German) keyboard layout for Linux lets me access curly quotes. And I didn’t find a handy keylayout editor like “Ukelele” for Linux. Anyway... Using \quotation / \quote I avoid typing quotation marks in most cases. There are exceptions – Hans mentioned missing or open-ended quotes, and sometimes the nesting of commands gets hairy (if quotations span paragraphs with additional markup), so that I manually type the quotation marks. I regard it a bad idea to make straight quotation marks (inch marks) active to allow for “curling” them and would suggest the csquotes package with its \enquote command for LaTeX, even if it’s missing the setups for many languages. In HTML you should be able to use <q> – I know that doesn’t work reliably in browsers (some add straight quotes to my CSS-configured guillemets). Anyway, sorry for being negative on your project. It’s great if it helps you and others. Hraban
In HTML you should be able to use <q> – I know that doesn’t work reliably in browsers (some add straight quotes to my CSS-configured guillemets).
The Converter class maps token replacements: https://github.com/DaveJarvis/keenquotes/blob/d6c9761f8fe1ae96391f25dc73be52... It'd be trivial to use <q> and </q>, instead. For my purposes, HTML entities work.
Using \quotation / \quote I avoid typing quotation marks in most cases.
When writing plain text documents, adding TeX code or HTML code to prescribe how the document should be presented is best avoided, so as to keep the document decoupled from a particular tool chain. YMMV. A deeper solution allows users to type the correctly curled quotes directly into the document.
On 6/18/2021 6:05 PM, Thangalin wrote:
In HTML you should be able to use <q> – I know that doesn’t work reliably in browsers (some add straight quotes to my CSS-configured guillemets).
The Converter class maps token replacements:
https://github.com/DaveJarvis/keenquotes/blob/d6c9761f8fe1ae96391f25dc73be52... https://github.com/DaveJarvis/keenquotes/blob/d6c9761f8fe1ae96391f25dc73be52...
It'd be trivial to use <q> and </q>, instead. For my purposes, HTML entities work.
Using \quotation / \quote I avoid typing quotation marks in most cases.
When writing plain text documents, adding TeX code or HTML code to prescribe how the document should be presented is best avoided, so as to keep the document decoupled from a particular tool chain. YMMV. A deeper solution allows users to type the correctly curled quotes directly into the document. As with may things today this quote is rather english language centered .. tex operates in a multi lingual domain and quotes have always been dealt with using macros so that we can be sure we get the right ones (left/right) with the right spacing.
Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
participants (3)
-
Hans Hagen
-
Henning Hraban Ramm
-
Thangalin