Problem processing a non-UTF8 Lua script
Hello,
I have a problem with processing a Lua file similar to that with \typefile described several days (weeks?) ago.
Let's have the following files:
- Ctx:
---
\mainlanguage[cz]
\enableregime[cp1250]
\starttext
AAA
\startluacode
local t = dofile("t.lua")
context(t)
\stopluacode
\stoptext
---
- And the t.lua:
---
-- return "111" -- This worked OK
return "žšč" -- This is problem for Ctx
---
When I try the t.lua from Win console by "Lua -e dofile('t.lua')", there is no problem.
When I run the Ctx file, it has problem processing characters with diacritics (like "čřž..."); the t.lua is not encoded in UTF-8 (which seems to be supposed by Ctx-Lua), but I'm using the cp1250 code page + each character = 1 B.
The error message is:
---
MTXrun | run 1: luatex --fmt="c:/ConTeXt/tex/texmf-cache/luatex-cache/context/f53042fa2e1c106bc7e3383ec8c3a00c/formats/cont-en" --lua="c:/ConTeXt/tex/texmf-cache/luatex-cache/context/f53042fa2e1c106bc7e3383ec8c3a00c/formats/cont-en.lui" --backend=pdf "D:/L
ukas/ConTeXt/Samples/U8/t-U8.mkiv"This is LuaTeX, Version beta-0.63.0-2010090921 (rev 3873)
\write18 enabled.
(D:/Lukas/ConTeXt/Samples/U8/t-U8.mkiv
jobcontrol > resuming randomizer with 0.50767540513321
ConTeXt ver: 2010.10.20 21:33 MKIV fmt: 2010.10.21 int: english/english
system : cont-new loaded
(c:/ConTeXt/tex/texmf-context/tex/context/base/cont-new.tex
systems : beware: some patches loaded from cont-new.tex
(c:/ConTeXt/tex/texmf-context/tex/context/base/cont-new.mkiv))
system : cont-fil.mkiv loaded
(c:/ConTeXt/tex/texmf-context/tex/context/base/cont-fil.mkiv
loading : ConTeXt File Synonyms
)
system : cont-sys.rme loaded
(c:/ConTeXt/tex/texmf-context/tex/context/user/cont-sys.rme (c:/ConTeXt/tex/texmf-context/tex/context/base/type-def.mkiv) (c:/ConTeXt/tex/texmf-context/tex/context/base/type-lua.mkiv) (c:/ConTeXt/tex/texmf-context/tex/context/base/type-siz.mkiv) (c:/ConTeX
t/tex/texmf-context/tex/context/base/type-otf.mkiv))
system : cont-err loaded
(c:/ConTeXt/tex/texmf-context/tex/context/base/cont-err.tex
systems : no file 'cont-sys.tex', using 'cont-sys.rme' instead
)
system : t-U8.top loaded
(t-U8.top
)
fonts : preloading latin modern fonts
{c:/ConTeXt/tex/texmf/fonts/map/dvips/lm/lm-math.map}{c:/ConTeXt/tex/texmf/fonts/map/dvips/lm/lm-rm.map}{c:/ConTeXt/tex/texmf-context/fonts/map/pdftex/context/mkiv-base.map}
bodyfont : 12pt rm is loaded
language : language en is active
publications : loading formatting style from bxml-apa
(c:/ConTeXt/tex/texmf-context/tex/context/base/bxml-apa.mkiv)
systems : begin file D:/Lukas/ConTeXt/Samples/U8/t-U8.mkiv at line 4
! String contains an invalid utf-8 sequence.
system > error on line 1 in file D:/Lukas/ConTeXt/Samples/U8/t-U8.mkiv: String contains an invalid utf-8 sequence ...
1 >> \mainlanguage[cz]
2 \enableregime[cp1250]
3
4 \starttext
5 AAA
6
7 \startluacode
8 local t = dofile("t.lua")
9
10 context(t)
11 \stopluacode
l.1
×ÜŔ
} context(t)> ...le("t.lua")
\dodostartluacode ...d \directlua \zerocount {#1}}
l.11 \stopluacode
backends > using xmp file 'c:/ConTeXt/tex/texmf-context/tex/context/base/lpdf-pdx.xml'
pages > flushing realpage 1, userpage 1, subpage 1
systems : end file D:/Lukas/ConTeXt/Samples/U8/t-U8.mkiv at line 12
)
2010/10/21 Vedran Miletić
2010/10/21 Procházka Lukáš
: Or how to make Ctx work with non-UTF8 Lua files?
Notepad supports saving to UTF-8. Can't you rather convert your files to it?
Come on ... Lukáš is merely trying to remove all the codepage-related problems in MKIV :) :) :) (Hans will hate me for having suggested to support \enableregime[cp1250] in MKIV in the first place, else there would at least be an excuse such as "sorry, but luatex doesn't support anything but utf-8" :) Mojca PS: even vim uses cp1250 on windows 7 by default
On Thu, Oct 21, 2010 at 18:55, Mojca Miklavec wrote:
Come on ... Lukáš is merely trying to remove all the codepage-related problems in MKIV :) :) :)
But then ... I admit that this case is so ugly that even I'm not sure if I would want to fix it and support it. I can imagine arbitrary complex mixtures of tex/lua/metapost/index sorting routine calling each other recursively ... and sooner or later this would probably break in one way or another even if some particular case gets fixed. Mojca
On 21-10-2010 6:55, Mojca Miklavec wrote:
2010/10/21 Vedran Miletić
: 2010/10/21 Procházka Lukáš
: Or how to make Ctx work with non-UTF8 Lua files?
Notepad supports saving to UTF-8. Can't you rather convert your files to it?
Come on ... Lukáš is merely trying to remove all the codepage-related problems in MKIV :) :) :)
(Hans will hate me for having suggested to support \enableregime[cp1250] in MKIV in the first place, else there would at least be an excuse such as "sorry, but luatex doesn't support anything but utf-8" :)
Don't worry, I have no problem with enableregime, but messing around with lua scripts and locales and codepages is asking for troubles and I'm not going to waste time on that. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Well, I've done it by the proposed way in this particular case.
(I.e. I have the only one UTF-8 coded file - and this is the only one text file in general - for this one case.
TextPad supports various codings as well.
I was not sure whether the text editor will be able to determine coding used, so I worried about having to select encoding every time I'd like to edit this Lua file.
But, fortunately, the editor is able to determine the UTF-8 coding; and the CP1250 as well.
So once I saved this Lua file with UTF-8 encoding, I don't have to reselect it.)
Lukas
On Thu, 21 Oct 2010 17:49:43 +0200, Vedran Miletić
2010/10/21 Procházka Lukáš
: Or how to make Ctx work with non-UTF8 Lua files?
Notepad supports saving to UTF-8. Can't you rather convert your files to it?
participants (5)
-
Arthur Reutenauer
-
Hans Hagen
-
Mojca Miklavec
-
Procházka Lukáš
-
Vedran Miletić