Re: [NTG-context] UTF conversion via Lua

10 Feb 2012

      2012/2/10 Procházka Lukáš Ing. - Pontex s. r. o. :
...
... Well, my information was not correct.
There are characters > 127 in the file, like "ř", "š"...
Each char = 1 byte, and as I'm using Windows with CP 1250, the characters
are displayed correctly.
But I have problem loading them into ConTeXt.
I need to convert the bytes > 127 to UTF sequence, which would be acceptable
by ConTeXt.
@Thomas:
The table looks nice but there are no entries for CP 1250 to UTF conversion.
I prepared some tables: character conversion and removal of diacritics (see
the attachment);
maybe it would be handful to include them into ConTeXt somehow.
Best regards,
Lukas
...
From wikipedia
"""
Unicode and the ISO/IEC 10646 Universal Character Set (UCS) have a
much wider array of characters, and their various encoding forms have
begun to supplant ISO/IEC 8859 and ASCII rapidly in many environments.
While ASCII is limited to 128 characters, Unicode and the UCS support
more characters by separating the concepts of unique identification
(using natural numbers called code points) and encoding (to 8-, 16- or
32-bit binary formats, called UTF-8, UTF-16 and UTF-32).
To allow backward compatibility, the 128 ASCII and 256 ISO-8859-1
(Latin 1) characters are assigned Unicode/UCS code points that are the
same as their codes in the earlier standards. Therefore, ASCII can be
considered a 7-bit encoding scheme for a very small subset of
Unicode/UCS, and, conversely, the UTF-8 encoding forms are
binary-compatible with ASCII for code points below 128, meaning all
ASCII is valid UTF-8. The other encoding forms resemble ASCII in how
To avoid confusion :
If you mean ASCII with coderange 0-127, there is no need to conversion;
if you mean ASCII with coderange 0-255 *and*  ISO-8859-1 (Latin 1)
encoding there is no need to conversion;
otherwise you need to specify an encoding (i.e. CP 1250)

they represent the first 128 characters of Unicode, but use 16 or 32
bits per character, so they require conversion for compatibility.
(similarly UCS-2 is upwards compatible with UTF-16)
"""
If you have iconv, convert between encoding is easy --- you can always
call it as an external program with os.execute(cmd)

-- 
luigi

Re: [NTG-context] UTF conversion via Lua

luigi scarso