Properly not really a ConTeXt question, but maybe nows the answer. Someone asked me how to convert a PDF to XML and back. The reasons is that he has a PDF in English, but he likes to have it also in Russian. His idea is to convert the PDF file to XML, translate the XML file with GoogleTranslate and convert the translated XML file to PDF. He asked me how to do this. Of-course it does not have to be a XML file, if GoogleTranslate can work with a TEX file, there is no reason not to do it. Does anyone know how to do this, or has pointers about how to do this? -- Cecil Westerhof
2011/1/19 Cecil Westerhof
Someone asked me how to convert a PDF to XML and back. The reasons is that he has a PDF in English, but he likes to have it also in Russian. His idea
It will be _much_ easier to get the original english sources and translate _them_ (and create a new PDF from the translate). Trust me. Best Martin
2011/1/19 Martin Schröder
Someone asked me how to convert a PDF to XML and back. The reasons is
2011/1/19 Cecil Westerhof
: that he has a PDF in English, but he likes to have it also in Russian. His idea
It will be _much_ easier to get the original english sources and translate _them_ (and create a new PDF from the translate). Trust me.
Would be my guess also. Was my first comment to this person. ;-} But he wants to do it this way. His idea is have standard PDF's on his website, but let people choose in which language they want it, and then let it be translated on the fly. He also has PDF's he can redistribute, but for which he will not get the sources. -- Cecil Westerhof
If your acquaintance actually needs an accurate translation into Russian, I wonder why he would choose Google Translate for that. Remember that Russian has 7 cases, and a complex verbal system with many different forms, all of which need to be deduced by Google from the much poorer English prepositions and the verbal foms in the text. Even though the result will no doubt show cyrillic words, which looks interesting, the factual result will be rubbish, and most likely unintelligible to any Russian. Regards, Robert Op 19 jan 2011, om 13:58 heeft Cecil Westerhof het volgende geschreven:
Properly not really a ConTeXt question, but maybe nows the answer.
Someone asked me how to convert a PDF to XML and back. The reasons is that he has a PDF in English, but he likes to have it also in Russian. His idea is to convert the PDF file to XML, translate the XML file with GoogleTranslate and convert the translated XML file to PDF. He asked me how to do this. Of-course it does not have to be a XML file, if GoogleTranslate can work with a TEX file, there is no reason not to do it.
Does anyone know how to do this, or has pointers about how to do this?
-- Cecil Westerhof ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________
2011/1/19 R. Ermers
If your acquaintance actually needs an accurate translation into Russian, I wonder why he would choose Google Translate for that. Remember that Russian has 7 cases, and a complex verbal system with many different forms, all of which need to be deduced by Google from the much poorer English prepositions and the verbal foms in the text. Even though the result will no doubt show cyrillic words, which looks interesting, the factual result will be rubbish, and most likely unintelligible to any Russian.
I do not know if he requires Russian, he was talking about Ukrainian. But that maybe has the same problems. Automatic translation is always a problem. I even do not like the results from English to Dutch. But his reasoning is: 'better a badly translated document, as no document'. I am not sure if I agree 100%, but if that is what he wants, who am I to -keep- telling him he is wrong? -- Cecil Westerhof
Even though the result will no doubt show cyrillic words, which looks interesting, the factual result will be rubbish, and most likely unintelligible to any Russian.
That's an interesting statement; do you have any experience with that at all, or are you simply speculating? I have never heard any claim that machine translation would be more difficult for some particular languages. It's generally a hard problem, and each language has its specific issues, not only Russian (that has 6 cases, by the way, not 7, and really only one fully conjugated tense). Arthur
Off topic: Well, I speak Russian and some other languages. Yes you are right, the 5th case is the locative (after o), the 6th case is the instrumental. One does not count the cases everyday :-) It is not a language in general that is difficult, but the pair a language is in: the pair English-Russian is, in some aspects, more difficult than the other way around because of the choice for the perfective aspect or imperfective aspect of the tenses. An English text does not offer any clues as to which aspect to choose, but anyone who wants to speaks Russian has to decide instantly. A program is unlikely do that. These problems might not exist for the pair Ukrainian-Russian, or perhaps (?) Polish-Russian, or - who knows - Basque-Russian. The options for determine the appropriate aspect, if programmers succeed in building them at all, are, for example, not needed in the pair English-Dutch. The reversed pair Russian-English poses different problems, such as when and where to put an article. The program has to derive from the context whether a given Russian noun in the text should be interpreted as determined or undetermined, and then whether it is appropriate to put the article, etcetera. Robert
Even though the result will no doubt show cyrillic words, which looks interesting, the factual result will be rubbish, and most likely unintelligible to any Russian.
That's an interesting statement; do you have any experience with that at all, or are you simply speculating? I have never heard any claim that machine translation would be more difficult for some particular languages. It's generally a hard problem, and each language has its specific issues, not only Russian (that has 6 cases, by the way, not 7, and really only one fully conjugated tense).
Arthur ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________
It is not a language in general that is difficult, but the pair a language is in: the pair English-Russian is, in some aspects, more difficult than the other way around because of the choice for the perfective aspect or imperfective aspect of the tenses. An English text does not offer any clues as to which aspect to choose, but anyone who wants to speaks Russian has to decide instantly. A program is unlikely do that.
I'm sorry, but that's pure speculation. It would be interesting to see research about machine translation for some particular language pairs, though. Arthur
Still off topic: Well, this is partly lexicological knowledge and research on translation. Each language pair and translation direction has its peculiar problems. Whether or not you are able to say it is "pure speculation" depends on how familiar you are with computer linguistics, and its progress in determining semantic content from texts (step 1) and rephrasing it in a given target language (step 2). I'm glad that you accept that it is about the pair and the direction of the translation. Robert
It is not a language in general that is difficult, but the pair a language is in: the pair English-Russian is, in some aspects, more difficult than the other way around because of the choice for the perfective aspect or imperfective aspect of the tenses. An English text does not offer any clues as to which aspect to choose, but anyone who wants to speaks Russian has to decide instantly. A program is unlikely do that.
I'm sorry, but that's pure speculation. It would be interesting to see research about machine translation for some particular language pairs, though.
Arthur ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________
A fun exercise is to put a text through google translate into any language, then pass the result back into the original language. via Russian: Fun exercise is to put the text through Google Translate in any language, and then pass the result back to the original language. via Japonese: Exercise is fun, Google is placing text via translation into other languages To pass the result to the original language. via French (for Arthur): A fun exercise is to put a text through Google translate in any language, then pass the result in the original language. ... Alan On Wednesday 19 January 2011 17:06:53 R. Ermers wrote:
Still off topic:
Well, this is partly lexicological knowledge and research on translation. Each language pair and translation direction has its peculiar problems.
Whether or not you are able to say it is "pure speculation" depends on how familiar you are with computer linguistics, and its progress in determining semantic content from texts (step 1) and rephrasing it in a given target language (step 2).
I'm glad that you accept that it is about the pair and the direction of the translation.
Robert
It is not a language in general that is difficult, but the pair a language is in: the pair English-Russian is, in some aspects, more difficult than the other way around because of the choice for the perfective aspect or imperfective aspect of the tenses. An English text does not offer any clues as to which aspect to choose, but anyone who wants to speaks Russian has to decide instantly. A program is unlikely do that.
I'm sorry, but that's pure speculation. It would be interesting to
see research about machine translation for some particular language pairs, though.
Arthur
_________________________________________________________________________ __________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net _________________________________________________________________________ __________
___________________________________________________________________________ ________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________ ________
-- Alan Braslau CEA DSM-IRAMIS-SPEC CNRS URA 2464 Orme des Merisiers 91191 Gif-sur-Yvette cedex FRANCE tel: +33 1 69 08 73 15 fax: +33 1 69 08 87 86 mailto:alan.braslau@cea.fr .''`. : :' : `. `'` `-
More off topic: A fun 'esercise be to put some text drough google translate into any language, den pass de result back into de o'iginal language. What it is, Mama! (via jive, a good exercise in lex and yacc, very politically incorrect!) On Wednesday 19 January 2011 17:18:35 Alan BRASLAU wrote:
A fun exercise is to put a text through google translate into any language, then pass the result back into the original language.
via Russian: Fun exercise is to put the text through Google Translate in any language, and then pass the result back to the original language.
via Japonese: Exercise is fun, Google is placing text via translation into other languages To pass the result to the original language.
via French (for Arthur): A fun exercise is to put a text through Google translate in any language, then pass the result in the original language.
...
Alan
On Wednesday 19 January 2011 17:06:53 R. Ermers wrote:
Still off topic:
Well, this is partly lexicological knowledge and research on translation. Each language pair and translation direction has its peculiar problems.
Whether or not you are able to say it is "pure speculation" depends on how familiar you are with computer linguistics, and its progress in determining semantic content from texts (step 1) and rephrasing it in a given target language (step 2).
I'm glad that you accept that it is about the pair and the direction of the translation.
Robert
It is not a language in general that is difficult, but the pair a language is in: the pair English-Russian is, in some aspects, more difficult than the other way around because of the choice for the perfective aspect or imperfective aspect of the tenses. An English text does not offer any clues as to which aspect to choose, but anyone who wants to speaks Russian has to decide instantly. A program is unlikely do that.
I'm sorry, but that's pure speculation. It would be interesting to
see research about machine translation for some particular language pairs, though.
Arthur
_______________________________________________________________________ __ __________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net _______________________________________________________________________ __ __________
_________________________________________________________________________ __ ________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net _________________________________________________________________________ __ ________
-- Alan Braslau CEA DSM-IRAMIS-SPEC CNRS URA 2464 Orme des Merisiers 91191 Gif-sur-Yvette cedex FRANCE tel: +33 1 69 08 73 15 fax: +33 1 69 08 87 86 mailto:alan.braslau@cea.fr .''`. : :' : `. `'` `-
Alan BRASLAU wrote:
A fun exercise is to put a text through google translate into any language, then pass the result back into the original language.
via Russian: Fun exercise is to put the text through Google Translate in any language, and then pass the result back to the original language. The Russian translation is much worse than the English->Russian->English one.
On Wed, Jan 19, 2011 at 1:58 PM, Cecil Westerhof
Properly not really a ConTeXt question, but maybe nows the answer.
Someone asked me how to convert a PDF to XML and back. The reasons is that he has a PDF in English, but he likes to have it also in Russian. His idea is to convert the PDF file to XML, translate the XML file with GoogleTranslate and convert the translated XML file to PDF. He asked me how to do this. Of-course it does not have to be a XML file, if GoogleTranslate can work with a TEX file, there is no reason not to do it. google for pdftotext
-- luigi
2011/1/19 luigi scarso
Properly not really a ConTeXt question, but maybe nows the answer.
Someone asked me how to convert a PDF to XML and back. The reasons is
On Wed, Jan 19, 2011 at 1:58 PM, Cecil Westerhof
wrote: that he has a PDF in English, but he likes to have it also in Russian. His idea is to convert the PDF file to XML, translate the XML file with GoogleTranslate and convert the translated XML file to PDF. He asked me how to do this. Of-course it does not have to be a XML file, if GoogleTranslate can work with a TEX file, there is no reason not to do it. google for pdftotext
Already done. What looked the most promissing was pdftohtml. Just wondering if there is a better way. -- Cecil Westerhof
On Wed, Jan 19, 2011 at 3:14 PM, Cecil Westerhof
Already done. What looked the most promissing was pdftohtml. Just wondering if there is a better way. What y do you want exactly ? Preserve structure ? formulas ? layout ? As far as these informations are not embedded (tagged) into the pdf you have to do (a lot of) manual work .
Also google for pdfdraw mupdf -- luigi
2011/1/19 luigi scarso
On Wed, Jan 19, 2011 at 3:14 PM, Cecil Westerhof
wrote: Already done. What looked the most promissing was pdftohtml. Just wondering if there is a better way. What y do you want exactly ? Preserve structure ? formulas ? layout ? As far as these informations are not embedded (tagged) into the pdf you have to do (a lot of) manual work .
My contact 'just' wants to translate the document. I already told him that this is easier said than done. But he is adamant. (Notwithstanding that several people already gave up on his quest.) I think structure and layout should be maintained. But I think it will be mostly 'simple' documents with text and some graphics. So I do not expect to have formula trouble. Also google for pdfdraw mupdf
I will do that. -- Cecil Westerhof
On Wed, Jan 19, 2011 at 3:35 PM, Cecil Westerhof
2011/1/19 luigi scarso
On Wed, Jan 19, 2011 at 3:14 PM, Cecil Westerhof
wrote: Already done. What looked the most promissing was pdftohtml. Just wondering if there is a better way. What y do you want exactly ? Preserve structure ? formulas ? layout ? As far as these informations are not embedded (tagged) into the pdf you have to do (a lot of) manual work .
My contact 'just' wants to translate the document. I already told him that this is easier said than done. But he is adamant. (Notwithstanding that several people already gave up on his quest.) I think structure and layout should be maintained. But I think it will be mostly 'simple' documents with text and some graphics. So I do not expect to have formula trouble. hm, maybe you can have a look at inkscape then (at least 0.48)
-- luigi
participants (7)
-
Alan BRASLAU
-
Arthur Reutenauer
-
Cecil Westerhof
-
luigi scarso
-
Martin Schröder
-
R. Ermers
-
Yury G. Kudryashov