strange (?) exporting from pdf

newer
TeX & chemistry (was: alignment of...

andrea valle

15 Oct 2005 15 Oct '05

5:28 p.m.

Hi to all, I was trying to convert form pdf to rtf in order to share docs with non-conTeXt people. Acrobat 7.0 allows with "save as" to export to many formats. When I convert a pdf created with MSword (or something like: I tried also with some on-line pdfs) I have substantially no problems. But when I convert pdfs created with context or latex I have no blank spaces in the output rtf. Also, accents became autonomous ' (like in source). This seems to be systematic: same behavior with conversion to doc or html, same behavior if I use Trapeze converter instead of Acrobat. E.g.: pdf in --> out (rtf, doc, ...): questo è un test --> questo`euntest I suppose it depends on pedf source generation. Any hints? Thanks a lot -a- Andrea Valle Laboratorio multimediale "G. Quazza" Facoltà di Scienze della Formazione Università degli Studi di Torino andrea.valle@unito.it

Show replies by date

Hans Hagen

17 Oct 17 Oct

10:27 a.m.

andrea valle wrote:

...

Hi to all, I was trying to convert form pdf to rtf in order to share docs with non-conTeXt people. Acrobat 7.0 allows with "save as" to export to many formats. When I convert a pdf created with MSword (or something like: I tried also with some on-line pdfs) I have substantially no problems. But when I convert pdfs created with context or latex I have no blank spaces in the output rtf. Also, accents became autonomous ' (like in source). This seems to be systematic: same behavior with conversion to doc or html, same behavior if I use Trapeze converter instead of Acrobat.

E.g.: pdf in --> out (rtf, doc, ...):

questo � un test --> questo`euntest

I suppose it depends on pedf source generation. Any hints?

tex does not have a space, and spacing ends up in skips; also, sometimes slot 32 is used for whatever char needs a slot; your problem is not related to pdftex, but a bug in the exporter which is unable to handle arbitrary encodings an option is to use texnansi encoding which is the least problematic one Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

Henning Hraban Ramm

10:38 p.m.

Am 2005-10-17 um 10:27 schrieb Hans Hagen:

...

...
I was trying to convert form pdf to rtf in order to share docs with non-conTeXt people. Acrobat 7.0 allows with "save as" to export to many formats. When I convert a pdf created with MSword (or something like: I tried also with some on-line pdfs) I have substantially no problems. But when I convert pdfs created with context or latex I have no blank spaces in the output rtf. Also, accents became autonomous ' (like in source). This seems to be systematic: same behavior with conversion to doc or html, same behavior if I use Trapeze converter instead of Acrobat.

tex does not have a space, and spacing ends up in skips; also, sometimes slot 32 is used for whatever char needs a slot; your problem is not related to pdftex, but a bug in the exporter which is unable to handle arbitrary encodings an option is to use texnansi encoding which is the least problematic one

I just read that Acrobat has an export bug since 6.0 (still exists in new 7.0.5), that eats sometimes also spaces and accented characters from MSW and other sources. Grüßlis vom Hraban! --- http://www.fiee.net/texnique/ http://contextgarden.net http://www.cacert.org (I'm an assurer)

andrea valle

18 Oct 18 Oct

1:10 a.m.

Thanks to all. I'm still struggling to find a way to share easily common documents with non-Context world. I thought I would have solved passing directly form the final pdf output to doc/rtf format, but it seems that I will have to give up. Best -a- On 17 Oct 2005, at 10:27, Hans Hagen wrote:

...

andrea valle wrote:

...
Hi to all, I was trying to convert form pdf to rtf in order to share docs with non-conTeXt people. Acrobat 7.0 allows with "save as" to export to many formats. When I convert a pdf created with MSword (or something like: I tried also with some on-line pdfs) I have substantially no problems. But when I convert pdfs created with context or latex I have no blank spaces in the output rtf. Also, accents became autonomous ' (like in source). This seems to be systematic: same behavior with conversion to doc or html, same behavior if I use Trapeze converter instead of Acrobat.

E.g.: pdf in --> out (rtf, doc, ...): questo � un test --> questo`euntest

I suppose it depends on pedf source generation. Any hints?

tex does not have a space, and spacing ends up in skips; also, sometimes slot 32 is used for whatever char needs a slot; your problem is not related to pdftex, but a bug in the exporter which is unable to handle arbitrary encodings an option is to use texnansi encoding which is the least problematic one Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------

_______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context

Andrea Valle DAMS - Facoltà di Scienze della Formazione Università degli Studi di Torino andrea.valle@unito.it

Henning Hraban Ramm

10:07 a.m.

Am 2005-10-18 um 01:10 schrieb andrea valle:

...

Thanks to all. I'm still struggling to find a way to share easily common documents with non-Context world. I thought I would have solved passing directly form the final pdf output to doc/rtf format, but it seems that I will have to give up.

Seems like the best way would be a XML source that you can process with ConTeXt to PDF or with XSLT to something completely different... Grüßlis vom Hraban! --- http://www.fiee.net/texnique/ http://contextgarden.net http://www.cacert.org (I'm an assurer)

andrea valle

10:28 a.m.

Yes, but the main problem is the "something completely different". I should implement rtf format as the output of XSLT (XHTML would be feasible). -a- On 18 Oct 2005, at 10:07, Henning Hraban Ramm wrote:

...

Am 2005-10-18 um 01:10 schrieb andrea valle:

...
Thanks to all. I'm still struggling to find a way to share easily common documents with non-Context world. I thought I would have solved passing directly form the final pdf output to doc/rtf format, but it seems that I will have to give up.

Seems like the best way would be a XML source that you can process with ConTeXt to PDF or with XSLT to something completely different...

Grüßlis vom Hraban! --- http://www.fiee.net/texnique/ http://contextgarden.net http://www.cacert.org (I'm an assurer)

_______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context

Andrea Valle DAMS - Facoltà di Scienze della Formazione Università degli Studi di Torino andrea.valle@unito.it

andrea valle

10:34 a.m.

...

Am 2005-10-18 um 01:10 schrieb andrea valle:

...
Thanks to all. I'm still struggling to find a way to share easily common documents with non-Context world. I thought I would have solved passing directly form the final pdf output to doc/rtf format, but it seems that I will have to give up.

Seems like the best way would be a XML source that you can process with ConTeXt to PDF or with XSLT to something completely different...

Grüßlis vom Hraban! --- http://www.fiee.net/texnique/ http://contextgarden.net http://www.cacert.org (I'm an assurer)

_______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context

Andrea Valle DAMS - Facoltà di Scienze della Formazione Università degli Studi di Torino andrea.valle@unito.it

Henning Hraban Ramm

11:10 a.m.

...

...
Seems like the best way would be a XML source that you can process with ConTeXt to PDF or with XSLT to something completely different... Yes, but the main problem is the "something completely different". I should implement rtf format as the output of XSLT (XHTML would be feasible).

I'm not a XML guru (never tried XML with ConTeXt), but I guess if you use a XML format like DocBook (or even OpenOffice's) there'd be a ready-to-use way for RTF. Grüßlis vom Hraban! --- http://www.fiee.net/texnique/ http://contextgarden.net http://www.cacert.org (I'm an assurer)

andrea valle

12:04 p.m.

Thanks a lot, I was in fact investigating OO. -a- On 18 Oct 2005, at 11:10, Henning Hraban Ramm wrote:

...

...
...
Seems like the best way would be a XML source that you can process with ConTeXt to PDF or with XSLT to something completely different... Yes, but the main problem is the "something completely different". I should implement rtf format as the output of XSLT (XHTML would be feasible).

I'm not a XML guru (never tried XML with ConTeXt), but I guess if you use a XML format like DocBook (or even OpenOffice's) there'd be a ready-to-use way for RTF.

Grüßlis vom Hraban! --- http://www.fiee.net/texnique/ http://contextgarden.net http://www.cacert.org (I'm an assurer)

_______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context

Andrea Valle DAMS - Facoltà di Scienze della Formazione Università degli Studi di Torino andrea.valle@unito.it

Henning Hraban Ramm

10:09 p.m.

...

...
I'm not a XML guru (never tried XML with ConTeXt), but I guess if you use a XML format like DocBook (or even OpenOffice's) there'd be a ready-to-use way for RTF. Thanks a lot, I was in fact investigating OO.

If you make up something useful from OpenDocumentFormat (or OOo's old format), please share it – I guess an OOo to ConTeXt converter would help some people. Grüßlis vom Hraban! --- http://www.fiee.net/texnique/ http://contextgarden.net http://www.cacert.org (I'm an assurer)

andrea valle

20 Oct 20 Oct

3:10 p.m.

(sorry, the last one was a message sent tow days ago by a wrong address.) Yes, in case of something useful, I will surely keep the list informed. Best -a- On 18 Oct 2005, at 22:09, Henning Hraban Ramm wrote:

...

...
...
I'm not a XML guru (never tried XML with ConTeXt), but I guess if you use a XML format like DocBook (or even OpenOffice's) there'd be a ready-to-use way for RTF. Thanks a lot, I was in fact investigating OO.

If you make up something useful from OpenDocumentFormat (or OOo's old format), please share it – I guess an OOo to ConTeXt converter would help some people.

Grüßlis vom Hraban! --- http://www.fiee.net/texnique/ http://contextgarden.net http://www.cacert.org (I'm an assurer)

_______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context

Andrea Valle DAMS - Facoltà di Scienze della Formazione Università degli Studi di Torino andrea.valle@unito.it

olivier Turlier

9:47 p.m.

Henning Hraban Ramm wrote:

...

...
...
I'm not a XML guru (never tried XML with ConTeXt), but I guess if you use a XML format like DocBook (or even OpenOffice's) there'd be a ready-to-use way for RTF.

Thanks a lot, I was in fact investigating OO.

If you make up something useful from OpenDocumentFormat (or OOo's old format), please share it – I guess an OOo to ConTeXt converter would help some people.

Grüßlis vom Hraban! --- http://www.fiee.net/texnique/ http://contextgarden.net http://www.cacert.org (I'm an assurer)

There is an exporter from OO to latex & XHTML, maintained by henrik Jensen : writer2latex (http://www.hj-gym.dk/~hj/writer2latex/). I've asked him if he could add context export, but he have no time to do it right now. So if someone is willing to ... Regards -- olivier Turlier

Hans Hagen Test

25 Oct 25 Oct

12:51 a.m.

Henning Hraban Ramm wrote:

...

...
...
I'm not a XML guru (never tried XML with ConTeXt), but I guess if you use a XML format like DocBook (or even OpenOffice's) there'd be a ready-to-use way for RTF.

Thanks a lot, I was in fact investigating OO.

If you make up something useful from OpenDocumentFormat (or OOo's old format), please share it � I guess an OOo to ConTeXt converter would help some people.

actually i've done some of that some time ago in an experimental project; doable as long asin oo one does not mess round too much with tabs (not that well structured xml) Hans

andrea valle

11:25 a.m.

After tweaking a bit with OO, I realized that defining precisely the format options is not my goal. My goal is to have as input structured text (model) to be mapped to different outputs (views). So, actually my minimal, low cost, idea for a setup is the following: 1. Write your doc in xhtml. 2. To interchange: open it directly in OO and convert it to rtf (doc, etc...). This is the strenght of (x)html 3. To have ConTeXt (great) output: 3.1 use xml commands in Context 3.2 translate in ConTeXt via a regular expression based python script (my default) This is the stenght of x(ht)ml -a- On 25 Oct 2005, at 00:51, Hans Hagen Test wrote:

...

Henning Hraban Ramm wrote:

...
...
...
I'm not a XML guru (never tried XML with ConTeXt), but I guess if you use a XML format like DocBook (or even OpenOffice's) there'd be a ready-to-use way for RTF.

Thanks a lot, I was in fact investigating OO.

If you make up something useful from OpenDocumentFormat (or OOo's old format), please share it � I guess an OOo to ConTeXt converter would help some people.

actually i've done some of that some time ago in an experimental project; doable as long asin oo one does not mess round too much with tabs (not that well structured xml)

Hans _______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context

Andrea Valle DAMS - Facoltà di Scienze della Formazione Università degli Studi di Torino andrea.valle@unito.it

Christopher Creutzig

3 Nov 3 Nov

8:58 a.m.

andrea valle wrote:

...

3.2 translate in ConTeXt via a regular expression based python script

I've been there and done that and I won't do it again. Use a real xml parser; regular expression engines are a wonderful thing, but not really usable for parsing nested structures. I'm sure there are many, many xml parsers for python to choose from. regards, Christopher

andrea valle

10:34 a.m.

I agree with you (and it's true there are many python xml parser). Actually, in order to share drafts of documents (which is my minimal purpose) I need very simple tagging, and it's convenient to use my little parser. When complexity passes my (low) manageability treshold, I will move to dom api. best -a- On 3 Nov 2005, at 08:58, Christopher Creutzig wrote:

...

andrea valle wrote:

...
3.2 translate in ConTeXt via a regular expression based python script

I've been there and done that and I won't do it again. Use a real xml parser; regular expression engines are a wonderful thing, but not really usable for parsing nested structures. I'm sure there are many, many xml parsers for python to choose from.

regards, Christopher _______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context

Andrea Valle DAMS - Facoltà di Scienze della Formazione Università degli Studi di Torino andrea.valle@unito.it

7180

Age (days ago)

7199

Last active (days ago)

List overview

Download

15 comments

7 participants

participants (7)

andrea valle
andrea valle
Christopher Creutzig
Hans Hagen
Hans Hagen Test
Henning Hraban Ramm
olivier Turlier