2012/2/10 Procházka Lukáš Ing. - Pontex s. r. o.
... Well, my information was not correct.
There are characters > 127 in the file, like "ř", "š"...
Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are displayed correctly.
But I have problem loading them into ConTeXt.
I need to convert the bytes > 127 to UTF sequence, which would be acceptable by ConTeXt.
@Thomas:
The table looks nice but there are no entries for CP 1250 to UTF conversion.
I prepared some tables: character conversion and removal of diacritics (see the attachment); maybe it would be handful to include them into ConTeXt somehow.
Best regards,
Lukas
From wikipedia """ Unicode and the ISO/IEC 10646 Universal Character Set (UCS) have a much wider array of characters, and their various encoding forms have begun to supplant ISO/IEC 8859 and ASCII rapidly in many environments. While ASCII is limited to 128 characters, Unicode and the UCS support more characters by separating the concepts of unique identification (using natural numbers called code points) and encoding (to 8-, 16- or 32-bit binary formats, called UTF-8, UTF-16 and UTF-32). To allow backward compatibility, the 128 ASCII and 256 ISO-8859-1 (Latin 1) characters are assigned Unicode/UCS code points that are the same as their codes in the earlier standards. Therefore, ASCII can be considered a 7-bit encoding scheme for a very small subset of Unicode/UCS, and, conversely, the UTF-8 encoding forms are binary-compatible with ASCII for code points below 128, meaning all ASCII is valid UTF-8. The other encoding forms resemble ASCII in how
To avoid confusion : If you mean ASCII with coderange 0-127, there is no need to conversion; if you mean ASCII with coderange 0-255 *and* ISO-8859-1 (Latin 1) encoding there is no need to conversion; otherwise you need to specify an encoding (i.e. CP 1250) they represent the first 128 characters of Unicode, but use 16 or 32 bits per character, so they require conversion for compatibility. (similarly UCS-2 is upwards compatible with UTF-16) """ If you have iconv, convert between encoding is easy --- you can always call it as an external program with os.execute(cmd) -- luigi