On 2012-02-10 12:11, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
... Well, my information was not correct.
There are characters > 127 in the file, like "ř", "š"...
Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are displayed correctly.
So it wasn’t ASCII after all ;-) No problem, just use iconv: iconv -f CP1250 -t UTF8 infile > outfile I do this a lot with movie subtitles … Hth, Philipp PS: If you still insist on converting at the Lua end only then your starting point might be “regi-cp1250.lua” in the Context base/ dir.
But I have problem loading them into ConTeXt.
I need to convert the bytes > 127 to UTF sequence, which would be acceptable by ConTeXt.
@Thomas:
The table looks nice but there are no entries for CP 1250 to UTF conversion.
I prepared some tables: character conversion and removal of diacritics (see the attachment); maybe it would be handful to include them into ConTeXt somehow.
Best regards,
Lukas
On Fri, 10 Feb 2012 11:57:32 +0100, Philipp Gesang
wrote: On 2012-02-10 11:22, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
Hello,
I have many files with ASCII encoding; this encoding must be kept as these files are processed also by another program.
When I work with them in ConTeXt, I need to convert them to UTF.
Not needed, as every ASCII string is a valid UTF8 string: “The UTF encoding has several good properties. By far the most important is that a byte in the ASCII range 0-127 represents itself in UTF. Thus UTF is backward compatible with ASCII.” http://doc.cat-v.org/plan_9/4th_edition/papers/utf You can use them in Luatex without further conversion.
Regards Philipp
Does Lua (in ConTeXt scope) offer a transformation function or a table of chars [ASCII-code] -> [UTF-code] or anything to provide the conversion?
Something like:
\startluacode local str = loadFile("a.txt") -- ASCII coded
str = context.ACSII2UTF(str) -- Or something like this \stopluacode
Best regards,
Lukas
-- Ing. Lukáš Procházka [mailto:LPr@pontex.cz] Pontex s. r. o. [mailto:pontex@pontex.cz] [http://www.pontex.cz] Bezová 1658 147 14 Praha 4
Tel: +420 244 062 238 Fax: +420 244 461 038
___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________
-- Ing. Lukáš Procházka [mailto:LPr@pontex.cz] Pontex s. r. o. [mailto:pontex@pontex.cz] [http://www.pontex.cz] Bezová 1658 147 14 Praha 4
Tel: +420 244 062 238 Fax: +420 244 461 038
___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________
-- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments