strange (?) exporting from pdf
Hi to all, I was trying to convert form pdf to rtf in order to share docs with non-conTeXt people. Acrobat 7.0 allows with "save as" to export to many formats. When I convert a pdf created with MSword (or something like: I tried also with some on-line pdfs) I have substantially no problems. But when I convert pdfs created with context or latex I have no blank spaces in the output rtf. Also, accents became autonomous ' (like in source). This seems to be systematic: same behavior with conversion to doc or html, same behavior if I use Trapeze converter instead of Acrobat. E.g.: pdf in --> out (rtf, doc, ...): questo è un test --> questo`euntest I suppose it depends on pedf source generation. Any hints? Thanks a lot -a- Andrea Valle Laboratorio multimediale "G. Quazza" Facoltà di Scienze della Formazione Università degli Studi di Torino andrea.valle@unito.it
andrea valle wrote:
Hi to all, I was trying to convert form pdf to rtf in order to share docs with non-conTeXt people. Acrobat 7.0 allows with "save as" to export to many formats. When I convert a pdf created with MSword (or something like: I tried also with some on-line pdfs) I have substantially no problems. But when I convert pdfs created with context or latex I have no blank spaces in the output rtf. Also, accents became autonomous ' (like in source). This seems to be systematic: same behavior with conversion to doc or html, same behavior if I use Trapeze converter instead of Acrobat.
E.g.: pdf in --> out (rtf, doc, ...):
questo � un test --> questo`euntest
I suppose it depends on pedf source generation. Any hints?
tex does not have a space, and spacing ends up in skips; also, sometimes slot 32 is used for whatever char needs a slot; your problem is not related to pdftex, but a bug in the exporter which is unable to handle arbitrary encodings an option is to use texnansi encoding which is the least problematic one Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Am 2005-10-17 um 10:27 schrieb Hans Hagen:
I was trying to convert form pdf to rtf in order to share docs with non-conTeXt people. Acrobat 7.0 allows with "save as" to export to many formats. When I convert a pdf created with MSword (or something like: I tried also with some on-line pdfs) I have substantially no problems. But when I convert pdfs created with context or latex I have no blank spaces in the output rtf. Also, accents became autonomous ' (like in source). This seems to be systematic: same behavior with conversion to doc or html, same behavior if I use Trapeze converter instead of Acrobat.
tex does not have a space, and spacing ends up in skips; also, sometimes slot 32 is used for whatever char needs a slot; your problem is not related to pdftex, but a bug in the exporter which is unable to handle arbitrary encodings an option is to use texnansi encoding which is the least problematic one
I just read that Acrobat has an export bug since 6.0 (still exists in new 7.0.5), that eats sometimes also spaces and accented characters from MSW and other sources. Grüßlis vom Hraban! --- http://www.fiee.net/texnique/ http://contextgarden.net http://www.cacert.org (I'm an assurer)
Thanks to all. I'm still struggling to find a way to share easily common documents with non-Context world. I thought I would have solved passing directly form the final pdf output to doc/rtf format, but it seems that I will have to give up. Best -a- On 17 Oct 2005, at 10:27, Hans Hagen wrote:
andrea valle wrote:
Hi to all, I was trying to convert form pdf to rtf in order to share docs with non-conTeXt people. Acrobat 7.0 allows with "save as" to export to many formats. When I convert a pdf created with MSword (or something like: I tried also with some on-line pdfs) I have substantially no problems. But when I convert pdfs created with context or latex I have no blank spaces in the output rtf. Also, accents became autonomous ' (like in source). This seems to be systematic: same behavior with conversion to doc or html, same behavior if I use Trapeze converter instead of Acrobat.
E.g.: pdf in --> out (rtf, doc, ...): questo � un test --> questo`euntest
I suppose it depends on pedf source generation. Any hints?
tex does not have a space, and spacing ends up in skips; also, sometimes slot 32 is used for whatever char needs a slot; your problem is not related to pdftex, but a bug in the exporter which is unable to handle arbitrary encodings an option is to use texnansi encoding which is the least problematic one Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
_______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Andrea Valle DAMS - Facoltà di Scienze della Formazione Università degli Studi di Torino andrea.valle@unito.it
Am 2005-10-18 um 01:10 schrieb andrea valle:
Thanks to all. I'm still struggling to find a way to share easily common documents with non-Context world. I thought I would have solved passing directly form the final pdf output to doc/rtf format, but it seems that I will have to give up.
Seems like the best way would be a XML source that you can process with ConTeXt to PDF or with XSLT to something completely different... Grüßlis vom Hraban! --- http://www.fiee.net/texnique/ http://contextgarden.net http://www.cacert.org (I'm an assurer)
Yes, but the main problem is the "something completely different". I should implement rtf format as the output of XSLT (XHTML would be feasible). -a- On 18 Oct 2005, at 10:07, Henning Hraban Ramm wrote:
Am 2005-10-18 um 01:10 schrieb andrea valle:
Thanks to all. I'm still struggling to find a way to share easily common documents with non-Context world. I thought I would have solved passing directly form the final pdf output to doc/rtf format, but it seems that I will have to give up.
Seems like the best way would be a XML source that you can process with ConTeXt to PDF or with XSLT to something completely different...
Grüßlis vom Hraban! --- http://www.fiee.net/texnique/ http://contextgarden.net http://www.cacert.org (I'm an assurer)
_______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Andrea Valle DAMS - Facoltà di Scienze della Formazione Università degli Studi di Torino andrea.valle@unito.it
Yes, but the main problem is the "something completely different". I should implement rtf format as the output of XSLT (XHTML would be feasible). -a- On 18 Oct 2005, at 10:07, Henning Hraban Ramm wrote:
Am 2005-10-18 um 01:10 schrieb andrea valle:
Thanks to all. I'm still struggling to find a way to share easily common documents with non-Context world. I thought I would have solved passing directly form the final pdf output to doc/rtf format, but it seems that I will have to give up.
Seems like the best way would be a XML source that you can process with ConTeXt to PDF or with XSLT to something completely different...
Grüßlis vom Hraban! --- http://www.fiee.net/texnique/ http://contextgarden.net http://www.cacert.org (I'm an assurer)
_______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Andrea Valle DAMS - Facoltà di Scienze della Formazione Università degli Studi di Torino andrea.valle@unito.it
Seems like the best way would be a XML source that you can process with ConTeXt to PDF or with XSLT to something completely different... Yes, but the main problem is the "something completely different". I should implement rtf format as the output of XSLT (XHTML would be feasible).
I'm not a XML guru (never tried XML with ConTeXt), but I guess if you use a XML format like DocBook (or even OpenOffice's) there'd be a ready-to-use way for RTF. Grüßlis vom Hraban! --- http://www.fiee.net/texnique/ http://contextgarden.net http://www.cacert.org (I'm an assurer)
Thanks a lot, I was in fact investigating OO. -a- On 18 Oct 2005, at 11:10, Henning Hraban Ramm wrote:
Seems like the best way would be a XML source that you can process with ConTeXt to PDF or with XSLT to something completely different... Yes, but the main problem is the "something completely different". I should implement rtf format as the output of XSLT (XHTML would be feasible).
I'm not a XML guru (never tried XML with ConTeXt), but I guess if you use a XML format like DocBook (or even OpenOffice's) there'd be a ready-to-use way for RTF.
Grüßlis vom Hraban! --- http://www.fiee.net/texnique/ http://contextgarden.net http://www.cacert.org (I'm an assurer)
_______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Andrea Valle DAMS - Facoltà di Scienze della Formazione Università degli Studi di Torino andrea.valle@unito.it
I'm not a XML guru (never tried XML with ConTeXt), but I guess if you use a XML format like DocBook (or even OpenOffice's) there'd be a ready-to-use way for RTF. Thanks a lot, I was in fact investigating OO.
If you make up something useful from OpenDocumentFormat (or OOo's old format), please share it – I guess an OOo to ConTeXt converter would help some people. Grüßlis vom Hraban! --- http://www.fiee.net/texnique/ http://contextgarden.net http://www.cacert.org (I'm an assurer)
(sorry, the last one was a message sent tow days ago by a wrong address.) Yes, in case of something useful, I will surely keep the list informed. Best -a- On 18 Oct 2005, at 22:09, Henning Hraban Ramm wrote:
I'm not a XML guru (never tried XML with ConTeXt), but I guess if you use a XML format like DocBook (or even OpenOffice's) there'd be a ready-to-use way for RTF. Thanks a lot, I was in fact investigating OO.
If you make up something useful from OpenDocumentFormat (or OOo's old format), please share it – I guess an OOo to ConTeXt converter would help some people.
Grüßlis vom Hraban! --- http://www.fiee.net/texnique/ http://contextgarden.net http://www.cacert.org (I'm an assurer)
_______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Andrea Valle DAMS - Facoltà di Scienze della Formazione Università degli Studi di Torino andrea.valle@unito.it
Henning Hraban Ramm wrote:
I'm not a XML guru (never tried XML with ConTeXt), but I guess if you use a XML format like DocBook (or even OpenOffice's) there'd be a ready-to-use way for RTF.
Thanks a lot, I was in fact investigating OO.
If you make up something useful from OpenDocumentFormat (or OOo's old format), please share it – I guess an OOo to ConTeXt converter would help some people.
Grüßlis vom Hraban! --- http://www.fiee.net/texnique/ http://contextgarden.net http://www.cacert.org (I'm an assurer)
There is an exporter from OO to latex & XHTML, maintained by henrik Jensen : writer2latex (http://www.hj-gym.dk/~hj/writer2latex/). I've asked him if he could add context export, but he have no time to do it right now. So if someone is willing to ... Regards -- olivier Turlier
Henning Hraban Ramm wrote:
I'm not a XML guru (never tried XML with ConTeXt), but I guess if you use a XML format like DocBook (or even OpenOffice's) there'd be a ready-to-use way for RTF.
Thanks a lot, I was in fact investigating OO.
If you make up something useful from OpenDocumentFormat (or OOo's old format), please share it � I guess an OOo to ConTeXt converter would help some people.
actually i've done some of that some time ago in an experimental project; doable as long asin oo one does not mess round too much with tabs (not that well structured xml) Hans
After tweaking a bit with OO, I realized that defining precisely the format options is not my goal. My goal is to have as input structured text (model) to be mapped to different outputs (views). So, actually my minimal, low cost, idea for a setup is the following: 1. Write your doc in xhtml. 2. To interchange: open it directly in OO and convert it to rtf (doc, etc...). This is the strenght of (x)html 3. To have ConTeXt (great) output: 3.1 use xml commands in Context 3.2 translate in ConTeXt via a regular expression based python script (my default) This is the stenght of x(ht)ml -a- On 25 Oct 2005, at 00:51, Hans Hagen Test wrote:
Henning Hraban Ramm wrote:
I'm not a XML guru (never tried XML with ConTeXt), but I guess if you use a XML format like DocBook (or even OpenOffice's) there'd be a ready-to-use way for RTF.
Thanks a lot, I was in fact investigating OO.
If you make up something useful from OpenDocumentFormat (or OOo's old format), please share it � I guess an OOo to ConTeXt converter would help some people.
actually i've done some of that some time ago in an experimental project; doable as long asin oo one does not mess round too much with tabs (not that well structured xml)
Hans _______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Andrea Valle DAMS - Facoltà di Scienze della Formazione Università degli Studi di Torino andrea.valle@unito.it
andrea valle wrote:
3.2 translate in ConTeXt via a regular expression based python script
I've been there and done that and I won't do it again. Use a real xml parser; regular expression engines are a wonderful thing, but not really usable for parsing nested structures. I'm sure there are many, many xml parsers for python to choose from. regards, Christopher
I agree with you (and it's true there are many python xml parser). Actually, in order to share drafts of documents (which is my minimal purpose) I need very simple tagging, and it's convenient to use my little parser. When complexity passes my (low) manageability treshold, I will move to dom api. best -a- On 3 Nov 2005, at 08:58, Christopher Creutzig wrote:
andrea valle wrote:
3.2 translate in ConTeXt via a regular expression based python script
I've been there and done that and I won't do it again. Use a real xml parser; regular expression engines are a wonderful thing, but not really usable for parsing nested structures. I'm sure there are many, many xml parsers for python to choose from.
regards, Christopher _______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Andrea Valle DAMS - Facoltà di Scienze della Formazione Università degli Studi di Torino andrea.valle@unito.it
participants (7)
-
andrea valle
-
andrea valle
-
Christopher Creutzig
-
Hans Hagen
-
Hans Hagen Test
-
Henning Hraban Ramm
-
olivier Turlier