Hello, does ConTeXt contain a built-in Lua conversion routine to convert text in UTF-16 (and others) to UTF-8 string? Something like: ---- \startluacode local str = "A unicode string" -- Or e.g. string loaded from a file str = convert(str, "utf16", "utf8") context(str) \stopuacode ---- TIA. Best regards, Lukas -- Ing. Lukáš Procházka | mailto:LPr@pontex.cz Pontex s. r. o. | mailto:pontex@pontex.cz | http://www.pontex.cz | IDDS:nrpt3sn Bezová 1658 147 14 Praha 4 Tel: +420 241 096 751 (+420 720 951 172) Fax: +420 244 461 038
On 1/27/2017 1:48 PM, Procházka Lukáš Ing. wrote:
Hello,
does ConTeXt contain a built-in Lua conversion routine to convert text in UTF-16 (and others) to UTF-8 string?
Something like:
---- \startluacode local str = "A unicode string" -- Or e.g. string loaded from a file
str = convert(str, "utf16", "utf8")
context(str) \stopuacode ----
utf.utf16_to_utf8_le utf.utf16_to_utf8_be utf.utf32_to_utf8_le utf.utf32_to_utf8_be normally when files are in utf16 and have a bom they will be dealt with properly ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
Hello Hans,
that's it, thank you!
Best regards,
Lukas
On Fri, 27 Jan 2017 14:08:42 +0100, Hans Hagen
utf16_to_utf8_le
-- Ing. Lukáš Procházka | mailto:LPr@pontex.cz Pontex s. r. o. | mailto:pontex@pontex.cz | http://www.pontex.cz | IDDS:nrpt3sn Bezová 1658 147 14 Praha 4 Tel: +420 241 096 751 (+420 720 951 172) Fax: +420 244 461 038
Hello,
is there also a way to convert CP1250 to UTF8 and vice versa?
Best regards,
Lukas
On Fri, 27 Jan 2017 14:08:42 +0100, Hans Hagen
On 1/27/2017 1:48 PM, Procházka Lukáš Ing. wrote:
Hello,
does ConTeXt contain a built-in Lua conversion routine to convert text in UTF-16 (and others) to UTF-8 string?
Something like:
---- \startluacode local str = "A unicode string" -- Or e.g. string loaded from a file
str = convert(str, "utf16", "utf8")
context(str) \stopuacode ----
utf.utf16_to_utf8_le utf.utf16_to_utf8_be utf.utf32_to_utf8_le utf.utf32_to_utf8_be
normally when files are in utf16 and have a bom they will be dealt with properly
----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________
-- Ing. Lukáš Procházka | mailto:LPr@pontex.cz Pontex s. r. o. | mailto:pontex@pontex.cz | http://www.pontex.cz | IDDS:nrpt3sn Bezová 1658 147 14 Praha 4 Mob.: +420 702 033 396 Tel.: +420 241 096 751
On 6/14/2017 2:07 PM, Procházka Lukáš Ing. wrote:
Hello,
is there also a way to convert CP1250 to UTF8 and vice versa?
regimes.toregime('8859-1',"abcde Ä","?") there's also fromregime
Best regards,
Lukas
On Fri, 27 Jan 2017 14:08:42 +0100, Hans Hagen
wrote: On 1/27/2017 1:48 PM, Procházka Lukáš Ing. wrote:
Hello,
does ConTeXt contain a built-in Lua conversion routine to convert text in UTF-16 (and others) to UTF-8 string?
Something like:
---- \startluacode local str = "A unicode string" -- Or e.g. string loaded from a file
str = convert(str, "utf16", "utf8")
context(str) \stopuacode ----
utf.utf16_to_utf8_le utf.utf16_to_utf8_be utf.utf32_to_utf8_le utf.utf32_to_utf8_be
normally when files are in utf16 and have a bom they will be dealt with properly
----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________
-- ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
OK, thank you;
I deduce:
regimes.toregime('8859-1',"abcde Ä","?")
means actually:
- "convert from the current regime" (be e.g. UTF8)
- regimes.toregime(<regime-to-convert-to>, <string-to-convert>,
On 6/14/2017 2:07 PM, Procházka Lukáš Ing. wrote:
Hello,
is there also a way to convert CP1250 to UTF8 and vice versa?
regimes.toregime('8859-1',"abcde Ä","?")
there's also fromregime
Best regards,
Lukas
On Fri, 27 Jan 2017 14:08:42 +0100, Hans Hagen
wrote: On 1/27/2017 1:48 PM, Procházka Lukáš Ing. wrote:
Hello,
does ConTeXt contain a built-in Lua conversion routine to convert text in UTF-16 (and others) to UTF-8 string?
Something like:
---- \startluacode local str = "A unicode string" -- Or e.g. string loaded from a file
str = convert(str, "utf16", "utf8")
context(str) \stopuacode ----
utf.utf16_to_utf8_le utf.utf16_to_utf8_be utf.utf32_to_utf8_le utf.utf32_to_utf8_be
normally when files are in utf16 and have a bom they will be dealt with properly
----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________
-- Ing. Lukáš Procházka | mailto:LPr@pontex.cz Pontex s. r. o. | mailto:pontex@pontex.cz | http://www.pontex.cz | IDDS:nrpt3sn Bezová 1658 147 14 Praha 4 Mob.: +420 702 033 396 Tel.: +420 241 096 751
On 06/14/2017 02:31 PM, Procházka Lukáš Ing. wrote:
OK, thank you; I deduce:
regimes.toregime('8859-1',"abcde Ä","?")
means actually:
- "convert from the current regime" (be e.g. UTF8) - regimes.toregime(<regime-to-convert-to>, <string-to-convert>,
) Hi Lukas,
-- Usage: -- regimes.toregime(<target-encoding>, <text>, <character-on-failure>))
Just in case it helps, Pablo -- http://www.ousia.tk
Hello,
thank you for the answers.
So - IMHO the following code should provide CP-1250 to UTF-8 conversion:
----
\enableregime[cp1250]
\startluacode
local cvt = function(fn)
print(fn)
local str = io.loaddata(fn)
--print(str)
str = regimes.toregime("utf", str, "?")
io.savedata(fn .. "~", str)
end
--
cvt("_01-Identifikacni-Udaje.mkiv")
\stopluacode
----
But ConTeXt (yesterday's beta) fails with:
----
lua error > lua error on line 17 in file X://Users/LPr/~/~Asci/Cvt2UTF8.mkiv:
...eta/tex/texmf-context/tex/context/base/mkiv/regi-ini.lua:127: bad argument #1 to 'for iterator' (table expected, got boolean)
stack traceback:
[C]: in function 'for iterator'
...eta/tex/texmf-context/tex/context/base/mkiv/regi-ini.lua:127: in function '__index'
...eta/tex/texmf-context/tex/context/base/mkiv/regi-ini.lua:182: in function 'toregime'
[ctxlua]:7: in function 'cvt'
[ctxlua]:14: in main chunk
7 local str = io.loaddata(fn)
8
9 str = regimes.toregime("utf", str, "?")
10
11 io.savedata(fn .. "~", str)
12 end
13
14 --
15
16 cvt("_01-Identifikacni-Udaje.mkiv")
17 >> \stopluacode
----
What's wrong with my code?
Any help would be appreciated.
Best regards,
Lukas
On Wed, 14 Jun 2017 14:31:47 +0200, Procházka Lukáš Ing.
OK, thank you; I deduce:
regimes.toregime('8859-1',"abcde Ä","?")
means actually:
- "convert from the current regime" (be e.g. UTF8) - regimes.toregime(<regime-to-convert-to>, <string-to-convert>,
) Lukas
On Wed, 14 Jun 2017 14:21:47 +0200, Hans Hagen
wrote: On 6/14/2017 2:07 PM, Procházka Lukáš Ing. wrote:
Hello,
is there also a way to convert CP1250 to UTF8 and vice versa?
regimes.toregime('8859-1',"abcde Ä","?")
there's also fromregime
-- Ing. Lukáš Procházka | mailto:LPr@pontex.cz Pontex s. r. o. | mailto:pontex@pontex.cz | http://www.pontex.cz | IDDS:nrpt3sn Bezová 1658 147 14 Praha 4 Mob.: +420 702 033 396 Tel.: +420 241 096 751
participants (3)
-
Hans Hagen
-
Pablo Rodriguez
-
Procházka Lukáš Ing.