Design for Translation
Hi, I am wondering how best to go about creating an evolving document that will need translation. This is a manual that will probably need to be produced - and maintained - in around 6 languages. For what it is worth, I have come up with two approaches, which follow. But I would really appreciate any insights anyone may have. ... 1) Just translate the file -------------------------- - Obey formatting rules in document tex file so as to most easily visually separate commands from text. - Distribute the files for translation with instructions. This will work until we need to modify the file, then how to communicate the modifications? An english diff? I think this may be the best solution, but there is also 2) Using blocks --------------- The "excursion" manual briefly describes an alternative (which I think may be too cumbersome). The idea would be to totally separate the text from any tex commands (except for a single type of begin/end sequence). Conceptually :- manual-env.tex: \defineblock[EN,de,it] \setupblock[EN][file=EN] \setupblock[DE][file=DE] \setupblock[IT][file=IT] \doifmode[EN]{\def\lang{EN}} \doifmode[DE]{\def\lang{DE}} \doifmode[IT]{\def\lang{IT}} manual.tex: \environment manual-env.tex \useblocks[\lang][installation-1] \useblocks[\lang][installation-2] EN.tex: \beginEN[installation-1] This is how to install the product type 1 \endEN \beginEN[installation-2] This is how to install the product type 2 \endEN DE.tex: \beginDE[installation-1] (german text here) \endDE \beginDE[installation-2] (german text here) \endDE IT.tex: \beginIT[installation-1] (italian text here) \endIT \beginIT[installation-2] (italian text here) \endIT -- John Devereux
John Devereux wrote:
But I would really appreciate any insights anyone may have.
I've got some experience on this. I'm sure my way is not optimal, but at least it is an experience. First thing to remember is that I started the ConTeXt project many years (5???) ago and the program *and* its documentation have evolved a lot since then. The other thing is that I didn't expect to have to deal with translations. The documentation (and the instrument the manual is for) were both supposed to exist in English only. Yeah. Sure. (We don't do consumer electronics, so regulations are a bit different than for stuff you buy in a shop.) So over these years I've dealt with repeating "please send us the manual as Word file for translation" queries. Every time I've explained in words of one syllable that there's is and will not be a Word file, that the distributed manual file has origins in a totally different system. We stopped using Word when the file grew so big that Word just couldn't cope and when most of the figures to be included were pdf anyway and thus easily incorporated into ConTeXt files. I always offer to send the potential translators the files and the editing instructions and say that I can do a pdf out of the translated files any time, for example after each chapter. (BTW, if you'd like my editing instructions, I have them somewhere in rtf format.) The reactions to the above information vary. A South American professional translator took the files without whining and turned in the translated Spanish text with only *two* messed up codes - which is a lot less of a mess than I do when editing. The French gave up directly; they supposedly have a Word version of their own of the manual and so do the Poles. Three other languages were written in Word (or similar) and I had all the fun in cutting-and-pasting the text into the ConTeXt files. Italian was first done with cut-and-paste method, but then needed so much work that to my surprise they edited the ConTeXt files for me with a very good result - that's probably the most accurate translation of the whole lot. The Russian version of our manual is in the works. They wanted to do it all by themselves, but I haven't heard anything since I debugged their last file (encoding problem, Win-cyrillic to UTF). I hope that means everything is under control there... They are basically working on a pared-down duplicate system so we can easily exchange files. I should add that except for the South American translator and the Russians, the other persons are not IT people, nerds or not necessarily even that computer litterate (if their usage of Word is anything to go by....). If your translators are used to structural coding (html, for example) and especially if they already use suitable editors, you'll have a lot less problems. Then the practical aspect. What I had from the beginning is a system where each chapter of the manual is a file of its own - makes it much easier to handle. Most of the formatting and setups is in the main file, so the chapters just contain list of figures and then the text itself. This makes them much easier to edit and handle. When I started getting the languages, I made subdirectories for them, one per language. This is where I put the tex files for that language + all the figures that have translated text in them; ConTeXt will look first in the same directory and then further afield, so if there's a translated figure, it will get used. If not, the figures of the English manual are used. That way I don't have to repeat anything that has no text in it - and the manuals compile from the beginning, first English everywhere and then little by little with translations. I have a main format file for each language. This is because of the language settings (hyphenation, labels), but also because the English manual is letter size and most translators prefer A4. Sometimes also one language only needs small adjustments (like we have no index in Italian), so I find it easier to keep all the layout stuff separate one language from another. However, the main layout had already been the same for two years when the first translation came along, otherwise I might move some information (like heading formatting) into a shared formatting/setup/layout file for easier changes. So, how do I keep all of this up to date? I don't. Fully. But if I could devote most of my working hours into that, I maybe could... Won't happen this decade, I think. One thing that helps is version control (SVN) that keeps all the files and I try to document very carefully in the log files what I've done. As I usually check in all the files before leaving work, I still remember what was done and the log is reasonably good. SVN also means that I can diff with an earlier version of the file and see what changes were made, this is also handy. Another thing is that any changes I can make myself, I'll do all over the manuals at one go. For example we had a small mistake in a protocol specification - fixing that didn't require much understanding of the language, just careful copying and pasting from another spot in the same file. Minor fixes in German and Swedish can also be done in-house. Our manuals also have version numbers and unless it is revised, any translation keeps its original version number. For example the current English version is 1.64, but I'm doing cut-and-paste on an old translation that's version 1.51. On the front page of each translated manual we now have a disclaimer saying that if information in that manual and the English version conflicts, the English is considered correct. [My ongoing cut-and-paste project is so bad that even with a three-word-command of the language in question I can tell that it is a sorry translation... so the old version number isn't that much of a hinder anyway...] I assume that the only way to keep an evolving manual on track is careful logging of what was done, by whom and when. If translations involve a lot of people, it'll probably help to have clear instructions on what to do when there are changes. When the master language is fixed, you need to know who will log the changes and make sure that the translations are fixed accordingly and how that's done - by sending a new file to translator, by telling them to check them out or by having them send those three lines to you by email so you can cut-and-paste them into your text. But it is definitely worth the while to figure things out so that anything that doesn't need translation, only exists once and is used by all languages. One very small thing that I came up with by the third translation language is that my files are named similarly, but with the language abbreviation first. So I have intro.tex (English), se-intro.tex (Swedish), es-intro.tex (Spanish) etc. First I thought this wasn't necessary as the files are in different directories (root directory intro.tex vs. Swedish/se-intro.tex vs. Spanish/es-intro.tex), but then I realized that it wasn't actually that fun to have three intro.tex tabs open in my Scite, couldn't tell fast which one was which. Occasionally the files also get copied to Desktop and sent all over by email and unique naming does make it easier to figure out what went and where. Just my five cents, Mari Finland
Dnia Wed, Mar 11, 2009 at 11:39:11PM +0200, Mari Voipio napisał(a):
John Devereux wrote:
But I would really appreciate any insights anyone may have.
I've got some experience on this. I'm sure my way is not optimal, but at least it is an experience.
And a very interesting one, Mari! I'd like to also add *my* five cents. I've recently started (with a friend) to manage a, let's say, medium-sized TeX project. It is typesetting a journal for a scientific society. (We haven't actually typeset a single issue, but the first one is coming...) On of the first things we did was to decide on a very strict policy of naming files. We also put them in different directories (one directory per paper and one big for an issue), and name them accordingly (the point with naming files is *extremely* important - if scores of people send you files, renaming them according to a strict set of rules is a must!). Another is to use a version control system; we went for Mercurial, since (a) I have another friend, who is a strong advocate of python & mercurial, so in case of trouble we have some support; (b) the only other VCS I know is RCS, and this is prehistory (and it copes only with single files, not directories!); (c) Mercurial's default (and indeed only) way of working is to commit changes etc. to a local copy of the repository and only then possibly communicating with a net repository now and then; this comes rather handy if half the team does not have internet acces at home (which is a deliberate decision of my friend, not an infrastructural problem in Poland;)). Believe me, with any larger project a reasonable VCS is also a must. As for the contributors (be them authors, translators or anything: if they are not geeks, do not expect them to read any set of rules, user's guide or anything longer than 1-2 pages (optimally 0.5-1 page). Regards PS. I was very sorry to read about Poles... I apologize for your Polish cooperants for using Word. Keep in mind that here in Poland we have a very strong TeX team (LM, Gyre!), but it is still not the majority of the society;)... -- Marcin Borkowski (http://mbork.pl) 888 * ostre słowa * ostra muzyka * ostra płyta
Mari Voipio wrote:
(BTW, if you'd like my editing instructions, I have them somewhere in rtf format.)
It turns out I was actually smart enough to add these into the version control when I revised the file some time ago. I checked the file and there's nothing that couldn't be published, so for now you can fetch the pdf version of my editing instructions at http://www.kpatents.com/pdf/support/manual_editing_instructions.pdf. The file will eventually disappear, but not this week. These instructions are written for a "Windows dummy" who's at best edited HMTL manually and at worst barely manages to open and save a file. The rtf version of the instructions is available at request, if somebody has a use for it. The only thing that I ask for is that I get your finished instruction file/presentation/whatever in exchange (pdf is fine) as that'll help me improve mine. Regards, Mari
Mari Voipio
John Devereux wrote:
But I would really appreciate any insights anyone may have.
I've got some experience on this. I'm sure my way is not optimal, but at least it is an experience.
Hi Mari, thank you very much for such a long and detailed answer! Your experience very much reflects how I can see things going.
First thing to remember is that I started the ConTeXt project many years (5???) ago and the program *and* its documentation have evolved a lot since then.
Is there anything in particular new that you think might help?
The other thing is that I didn't expect to have to deal with translations. The documentation (and the instrument the manual is for) were both supposed to exist in English only. Yeah. Sure. (We don't do consumer electronics, so regulations are a bit different than for stuff you buy in a shop.)
So over these years I've dealt with repeating "please send us the manual as Word file for translation" queries. Every time I've explained in words of one syllable that there's is and will not be a Word file, that the distributed manual file has origins in a totally different system. We stopped using Word when the file grew so big that Word just couldn't cope and when most of the figures to be included were pdf anyway and thus easily incorporated into ConTeXt files. I always offer to send the potential translators the files and the editing instructions and say that I can do a pdf out of the translated files any time, for example after each chapter. (BTW, if you'd like my editing instructions, I have them somewhere in rtf format.)
Certainly, if it is convenient, thank you.
The reactions to the above information vary. A South American professional translator took the files without whining and turned in the translated Spanish text with only *two* messed up codes - which is a lot less of a mess than I do when editing. The French gave up directly; they supposedly have a Word version of their own of the manual and so do the Poles. Three other languages were written in Word (or similar) and I had all the fun in cutting-and-pasting the text into the ConTeXt files. Italian was first done with cut-and-paste method, but then needed so much work that to my surprise they edited the ConTeXt files for me with a very good result - that's probably the most accurate translation of the whole lot.
2 out of 7 is not what I was hoping for... I assume these were your agents or similar (rather than "professional" translators)? If so that is how we were hoping to do it too.
The Russian version of our manual is in the works. They wanted to do it all by themselves, but I haven't heard anything since I debugged their last file (encoding problem, Win-cyrillic to UTF). I hope that means everything is under control there... They are basically working on a pared-down duplicate system so we can easily exchange files.
I should add that except for the South American translator and the Russians, the other persons are not IT people, nerds or not necessarily even that computer litterate (if their usage of Word is anything to go by....). If your translators are used to structural coding (html, for example) and especially if they already use suitable editors, you'll have a lot less problems.
Not much chance of that I'm afraid. Although there's no reason I could not tell them to use Scite or similar. I was going to try saving the tex original as .doc - word seems to open it OK - and then saving *their* end product as "encoded text/UTF8". Has anyone tried this?
Then the practical aspect. What I had from the beginning is a system where each chapter of the manual is a file of its own - makes it much easier to handle. Most of the formatting and setups is in the main file, so the chapters just contain list of figures and then the text itself. This makes them much easier to edit and handle.
I was going to have a single environment file, which the translators never see, then a *single* document file. But perhaps separate chapters would be better... My document is not so big, maybe 30 pages of text (plenty of screen captures too). So perhaps my document is like one of your chapters. But there could be several other documents in the pipeline, so am trying to come up with a workable approach.
When I started getting the languages, I made subdirectories for them, one per language. This is where I put the tex files for that language + all the figures that have translated text in them; ConTeXt will look first in the same directory and then further afield, so if there's a translated figure, it will get used. If not, the figures of the English manual are used. That way I don't have to repeat anything that has no text in it - and the manuals compile from the beginning, first English everywhere and then little by little with translations.
I have a main format file for each language. This is because of the language settings (hyphenation, labels), but also because the English manual is letter size and most translators prefer A4. Sometimes also one language only needs small adjustments (like we have no index in Italian), so I find it easier to keep all the layout stuff separate one language from another. However, the main layout had already been the same for two years when the first translation came along, otherwise I might move some information (like heading formatting) into a shared formatting/setup/layout file for easier changes.
I think I can do everything with one environment file and some modes. I also have a similarly functioning layout in mind to avoid duplication of common images, with fallbacks and so forth.
So, how do I keep all of this up to date?
I don't. Fully. But if I could devote most of my working hours into that, I maybe could... Won't happen this decade, I think.
One thing that helps is version control (SVN) that keeps all the files and I try to document very carefully in the log files what I've done. As I usually check in all the files before leaving work, I still remember what was done and the log is reasonably good. SVN also means that I can diff with an earlier version of the file and see what changes were made, this is also handy.
Yes, I am already using version control (git). [...]
I assume that the only way to keep an evolving manual on track is careful logging of what was done, by whom and when. If translations involve a lot of people, it'll probably help to have clear instructions on what to do when there are changes. When the master language is fixed, you need to know who will log the changes and make sure that the translations are fixed accordingly and how that's done - by sending a new file to translator, by telling them to check them out or by having them send those three lines to you by email so you can cut-and-paste them into your text.
No "magic bullet" then.
But it is definitely worth the while to figure things out so that anything that doesn't need translation, only exists once and is used by all languages.
Very true - I have seen this when doing the software part of our project. It's even worse than I have been describing, we actually have to produce the software and manuals for several different vendors, with their own "branding", company logos, splash screens, icons etc. I have had to very carefully think through how to avoid having N_LANGUAGES*N_VENDORS separate copies of *everything* to maintain. The manuals are where everything comes to a head - here we really do need to produce N_LANGUAGES*N_VENDORS versions. I believe Context will be very useful in programmatically generating all the manual variants from a single source tree.
One very small thing that I came up with by the third translation language is that my files are named similarly, but with the language abbreviation first. So I have intro.tex (English), se-intro.tex (Swedish), es-intro.tex (Spanish) etc. First I thought this wasn't necessary as the files are in different directories (root directory intro.tex vs. Swedish/se-intro.tex vs. Spanish/es-intro.tex), but then I realized that it wasn't actually that fun to have three intro.tex tabs open in my Scite, couldn't tell fast which one was which. Occasionally the files also get copied to Desktop and sent all over by email and unique naming does make it easier to figure out what went and where.
A useful point, thanks. I *was* going to give them the same name.
Just my five cents,
Thanks, -- John Devereux
John Devereux wrote:
First thing to remember is that I started the ConTeXt project many years (5???) ago and the program *and* its documentation have evolved a lot since then.
Is there anything in particular new that you think might help?
Compared with the situation a few years back, WinConTeXt MkIV is a breeze to install and update. And it works without fiddling, no font installation, no funny coding in the format file, nothing. Even the Cyrillic version worked when I just got the file encoding correct. Even if they'll never compile the file, installing WinConTeXt is probably the easiest way to get SciTe with correct syntax highlighting. Except that you need to set UTF8 as default by yourself and not all users can get that far; the more patient ones will, though, if instructions are clear (go to menu x, click on y, write or copy to the file exactly the following line, save, close SciTe, reopen.) Daydreaming towards off-topic direction: This would actually be really handy for people who need to edit .tex but never compile: a download-and-click-to-install SciTecumConTeXt package that had .tex defaulted to ConTeXt and had appropriate highlighting, but nothing else. Encoding could probably also be defaulted to UTF8 at this point or the installation could ask whether encoding is Windows standard or UTF and then put the default in as needed. If such a package existed, I could just dump the installation package and the files to be edited on a USB stick or CD and (snail)mail the whole thing to the person doing the translation. Most Windows users can handle the installation bit, if it is like installing for example Adobe Acrobat Reader. I.e. I'm looking for something like the Notepad related free/shareware html editors that highlight but don't have much brain otherwise - while I use SciTe for my html work, my colleague has been happy with Notepad-something-or-other (and that finally rescued me and our code from the clutches of FrontPage).
2 out of 7 is not what I was hoping for... I assume these were your agents or similar (rather than "professional" translators)? If so that is how we were hoping to do it too.
Correct. One of those two has some background in programming, so he even managed the first chapter with Notepad (or something similar), also he whined that it was difficult without highlighting. The other one was an elderly consultant who may have dealt with computers in the time when text editing/layouting still involved similar coding. Or then he was just used to taking on any job that falls his way, at least he managed beautifully. All or most of the others are marketing people who take one look at the editing instructions and give up. This happened even in-house where I would have been easy to reach for any help and even offered to install the system on his computer. No hope (I was reasonably annoyed with this one, cutting-and-pasting that language took me like three weeks...). The funniest (but sad) thing is that after they give up on .tex, I also offer to convert the graphics into formats that will be easy to insert into Word if they want to do a translation of their own, but they've *never* taken me up on that offer yet. Instead they apparently just cut and paste the graphics from our official pdf and the result is usually not as good quality as it would have been with the stuff I send. The one I'm now working on is pretty .... sad. Although not as sad as the 20 meg version I received a year earlier that had tracking on, too (thankfully Word2007 now has an easy cleaning function with which I got rid of the 10 megs of revisions).
I was going to try saving the tex original as .doc - word seems to open it OK - and then saving *their* end product as "encoded text/UTF8". Has anyone tried this?
I think I tried this with the Russian test file and it worked, but I can retry (I've got Word 2003). The very important bit is to choose "encoded text" as save format - it allows you to do plain text + UTF8, but the file just isn't saved as UTF8 (confusing or what????). Been there, done that.... Also, the tip I got here for a character converter, charsc, was pretty good. Can't quite figure out the command line version, but I can get it to work if I know the original encoding (it couldn't guess at Windows-Cyrillic, IsoLatin1 went better). It is downloadable at http://www.kalytta.com/tools.php. Also, a caveat about Word. As my editing instructions say, you really have to have all AutoXxx features turned off or your tex code can get pretty fishy. In the worst case the autocorrect features do things to your parentheses and even in the best case you get to fight with things like ... turning into real ellipsis and possibly getting mussed up later in conversion. If you can talk your translators into using WordPad instead (if they are not up to trying SciTe), it will make your life easier in the long run. *you* can still open the resulting .rtf file in Word and convert it into UTF8 as long as your Word has been tamed. Also, when you try to compile the translated file, expect to get stuck at any % or & that are in the running text. They never remember to put that backslash in front of these...
I was going to have a single environment file, which the translators never see, then a *single* document file. But perhaps separate chapters would be better... My document is not so big, maybe 30 pages of text (plenty of screen captures too). So perhaps my document is like one of your chapters. But there could be several other documents in the pipeline, so am trying to come up with a workable approach.
We have another, 30 page manual, that was done after the big one (which wasn't quite so big then, we started with three models...), but it has the same type of division. I just find it much easier to find the stuff to be fixed when the file lenght is limited. And the graphics take space in the code, too, sometimes almost as much as on the page in reality (if they are smallish). Besides, when I check the files in before leaving work, the names of the files jog my memory about what changes I made and thus the version control log becomes more accurate. (Instead of just changing manual.tex I change intro.tex and sensorspecs.tex and ethernet.tex and all of these get flagged when checking files back in.) And in case of gigantic messup I can just revert the file for that chapter and lose those changes, but everything else I did the same day is safe. Been there, done that too.... The longer the individual file is, the more you have to lose if things go wrong.
I think I can do everything with one environment file and some modes. I also have a similarly functioning layout in mind to avoid duplication of common images, with fallbacks and so forth.
If I started now, I'd use modes, too. I once tried, but didn't understand enough to really get it to work and then the manual was already pretty big. I'll probably try again with the newer and much smaller manual at some point (when I have a bit of time to study, but before somebody wants to translate it into any other language). Sharing the misery, Mari
participants (3)
-
John Devereux
-
Marcin Borkowski
-
Mari Voipio