I have very recently launched a new journal which has been designed on the assumption that it will exist in both electronic form and in print—hence, it is produced using ConTeXt and exists natively in PDF files. This morning I was asked by a colleague who is totally blind whether it would be possible to for him have ASCII or .txt files that he could use easily with his screen reading software. (My sense is that he may be able to use PDF files with this software, but that it is not easy.) So, does anyone on the list have ideas about how to produce such files from the files I currently have in hand or any experience with this sort of problem? Is there, for instance, a way to strip away all the formatting commands from a ConTeXt source file automatically so as to leave an unencoded .txt file that I could send him? I gather that he can use .htm files, but so far as I can tell there is no path from a ConTeXt source file to an HTML file—at least, a specific query about this made recently on this list by someone else seems to have gone unanswered. Cheers, Alan
On Wed, 14 Apr 2004 16:50:04 -0400
Alan Bowen
So, does anyone on the list have ideas about how to produce such files
from the files I currently have in hand or any experience with this sort of problem?
I have used the pdftotext utility, part of the xpdf package, for similar tasks. In the case of hyphenated line endings, the word will be hyphenated and broken across lines just as in the pdf, and that might be a problem for the reader program. -Bill -- Sattre Press The King in Yellow http://sattre-press.com/ by Robert W. Chambers info@sattre-press.com http://sattre-press.com/kiy.html
Bill McClain wrote:
On Wed, 14 Apr 2004 16:50:04 -0400 Alan Bowen
wrote: So, does anyone on the list have ideas about how to produce such files
from the files I currently have in hand or any experience with this sort of problem?
I have used the pdftotext utility, part of the xpdf package, for similar tasks. In the case of hyphenated line endings, the word will be hyphenated and broken across lines just as in the pdf, and that might be a problem for the reader program.
-Bill
From my own experience pdftotext also has trouble handling multicolumn documents. Adobe has an online utility for transforming PDF to html, which can rather easily be turned into text, which worked pretty well for me, breaking columns into something useful instead of mashing all the text together. Regards, Erik Hetzner
You'd have to do it a file at a time, but does the Acrobat Reader's "save as text" function do what you need? A much bigger solution would be to have your source as xml and then go from there to ConTeXt and pdf or straight to plain text via XSLT. Matt Alan Bowen wrote:
I have very recently launched a new journal which has been designed on the assumption that it will exist in both electronic form and in print—hence, it is produced using ConTeXt and exists natively in PDF files. This morning I was asked by a colleague who is totally blind whether it would be possible to for him have ASCII or .txt files that he could use easily with his screen reading software. (My sense is that he may be able to use PDF files with this software, but that it is not easy.)
So, does anyone on the list have ideas about how to produce such files from the files I currently have in hand or any experience with this sort of problem? Is there, for instance, a way to strip away all the formatting commands from a ConTeXt source file automatically so as to leave an unencoded .txt file that I could send him? I gather that he can use .htm files, but so far as I can tell there is no path from a ConTeXt source file to an HTML file—at least, a specific query about this made recently on this list by someone else seems to have gone unanswered.
Cheers, Alan _______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Bill, Erik, and Matthew— Thank you very much for the suggestions. I will explore pdftotext and the Acrobat “Save As” options. One of the problems for all—and perhaps it is insuperable—is the ability of such reading software to present phrases in foreign languages and mathematical expressions. I will report to you at least on what I discover. Best, Alan On Apr 14, 2004, at 7:44 PM, Matthew Huggett wrote:
You'd have to do it a file at a time, but does the Acrobat Reader's "save as text" function do what you need?
A much bigger solution would be to have your source as xml and then go from there to ConTeXt and pdf or straight to plain text via XSLT.
Matt
Alan Bowen wrote:
I have very recently launched a new journal which has been designed on the assumption that it will exist in both electronic form and in print—hence, it is produced using ConTeXt and exists natively in PDF files. This morning I was asked by a colleague who is totally blind whether it would be possible to for him have ASCII or .txt files that he could use easily with his screen reading software. (My sense is that he may be able to use PDF files with this software, but that it is not easy.)
So, does anyone on the list have ideas about how to produce such files from the files I currently have in hand or any experience with this sort of problem? Is there, for instance, a way to strip away all the formatting commands from a ConTeXt source file automatically so as to leave an unencoded .txt file that I could send him? I gather that he can use .htm files, but so far as I can tell there is no path from a ConTeXt source file to an HTML file—at least, a specific query about this made recently on this list by someone else seems to have gone unanswered.
Cheers, Alan _______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
_______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
On Thu, 15 Apr 2004 10:00:12 -0400
Alan Bowen
Thank you very much for the suggestions. I will explore pdftotext and the Acrobat _Save As_ options.
Another issue with these methods is that the header and footer information on each page will be included, which could be irritating or helpful, depending on the application.
One of the problems for all_and perhaps it is insuperable_is the ability of such reading software to present phrases in foreign languages and mathematical expressions.
I haven't done any XML writing, but I think that would be the superior approach. If special elements of the text are tagged, then they could be translated appropriately for the blind reader. I use a text-to-speech program for proofing some of my documents and have found it helpful to filter the original text and emit a coded version which makes it easy for the speech program to read, and easier for me to understand. I'll have it say "quote", "endquote", "italics", etc. I'm working from the Context source directly, but XML sources could be used similarly, and there are lots of XML tools in the world. -Bill -- Sattre Press History of Astronomy http://sattre-press.com/ During the 19th Century info@sattre-press.com by Agnes M. Clerke http://sattre-press.com/han.html
On Wed, Apr 14, 2004 at 04:50:04PM -0400, Alan Bowen wrote:
So, does anyone on the list have ideas about how to produce such files from the files I currently have in hand or any experience with this sort of problem? Is there, for instance, a way to strip away all the formatting commands from a ConTeXt source file automatically so as to leave an unencoded .txt file that I could send him? I gather that he can use .htm files, but so far as I can tell there is no path from a ConTeXt source file to an HTML file?at least, a specific query about this made recently on this list by someone else seems to have gone unanswered.
There is a utility called untex, that strips LaTeX formating from a tex file. I didn't test it with ConTeXt, but it may work too. If you can produce a dvi file, there is couple of programs: dvi2tty and catdvi that can extract text from a dvi file, Finally, pdftotext, which I believe is a part of the xpdf package, can extract text from many pdf files. Finally, there is a program called tex2page, that convert TeX to html. Unlike latex2html, it can handle at least some plain TeX, so it may be possible to use it on ConTeXt files. Again, I didn't try it. If you want to experiment with it, it is at http://www.ccs.neu.edu/home/dorai/tex2page/tex2page-doc.html -- Jan Hlavacek (260) 434-7566 Department of Mathematics Jhlavacek@sf.edu University of Saint Francis http://www.sf.edu/jhlavacek/
At 17:32 15/04/2004, you wrote:
There is a utility called untex, that strips LaTeX formating from a tex file. I didn't test it with ConTeXt, but it may work too. If you can produce a dvi file, there is couple of programs: dvi2tty and catdvi that can extract text from a dvi file, Finally, pdftotext, which I believe is a part of the xpdf package, can extract text from many pdf files.
Finally, there is a program called tex2page, that convert TeX to html. Unlike latex2html, it can handle at least some plain TeX, so it may be possible to use it on ConTeXt files. Again, I didn't try it. If you want to experiment with it, it is at http://www.ccs.neu.edu/home/dorai/tex2page/tex2page-doc.html
since most context commands are instances of more generic ones, you can define another style to process the file to something suited for blind, say: \setuphead[chapter][style=normal] but that could be a lot of work. More simple is to use pdftotext which works ok for most cases, \setuplayout[header=0pt,footer=0pt] \setupcolumns[n=1] is then probably enough btw, there are ways to get auditive info in the pdf file, for instance let the voice engine speak and so Hans
participants (6)
-
Alan Bowen
-
Bill McClain
-
Erik Hetzner
-
Hans Hagen
-
Jan Hlavacek
-
Matthew Huggett