Just a quick reply (it's bedtime over here): there may be 2 problems. 1 is that the mail program put in an unwanted linebreak after the =~ part, just remove it; it should all be one line. And then: you'll need a fairly recent version of perl for it to work, what do you get when you do perl --version I guess for utf to work, it should be at least 5.8.0. Your basic idea of the usage is right (I'm not a windows person, but I assume it should be the same): save the scipt as utf2tex.pl, make it executable and call it as utf2tex.pl FILENAME.txt. I guess it would be easiest to convert the utf to ascii directly - that would mean you could later convert it back. I have a set of scripts that do just that -- convert babel Greek into utf-8 and back. If you need more help, I'll look into it tomorrow! Best Thomas On Sat, 2004-06-05 at 23:33, Idris Samawi Hamid wrote:
On Sat, 05 Jun 2004 22:41:39 +0200, Thomas A. Schmitz
wrote: Idris,
I know a bit of perl and would love to help. However, I fear that sending us your stuff via mail will be a bit difficult because the utf-8 chracters get transformed into gibberish.
Thnx 4 such a speedy reply! I don't think you are getting gibberish though; you should be getting the extended ascii representation. So the letter alif (hex 0627) should look like this:
ا
Do you get a forward-slashed circle and a section symbol? If so, that's the ascii representation I'm trying to convert to the letter `A'.
Here are the codes you want:
ا [0627] => A
ب [0628] => b
ج [062C] => j
د [062F] => d
Ù‡ [0647] => h
Ùˆ [0648] => w
ز [0632] => z
Let me explain my situation more clearly:-)
I have a unicode editor, Unitype Global Writer. I save a unicode document as a utf *.txt file. When I open that saved file in my TeX editor (WinEdt), it comes out as extended ascii (that's the "gibberish"). So what I wanted to do was convert the ascii "gibberish" to my Latin transcription. It seems that what you are suggesting is to use the hex representation and convert the unicode txt file into a Latin transcription file directly and bypass the gibberish.
On your perl file, can you give me an example of how to use it? I tried (in windows, with name utf2tex.pl and unicode text in unicode-utf.txt) and get
=========================
perl utf2tex.pl unicode-utf.txt Unknown discipline class ':utf8' at C:/Perl/lib/open.pm line 18. BEGIN failed--compilation aborted at utf2tex.pl line 4. =========================
from your script I tried, e.g.
============================ $_ =~ s/\x{0627}/\x{0041}/esg; # from alif to `A' ============================
Your guidance will be greatly appreciated!
Thnx a million! Idris