Hi all, I'm working on my Greek module again and am trying to filter and massage the input via lpeg, but there's something I don't quite get. As a minimal example: suppose I want to substitute A and B in my input with X and leave all other letters alone. Here's my attempt: \startluacode do local replace = { A = "X", B = "X", } local dosub = (lpeg.Cs(1)) / replace local subs = (dosub)^0 function test (string) tex.sprint(lpeg.match(subs,string)) end end \stopluacode \def\Substitute#1{\ctxlua{test("#1")}} \starttext \Substitute{ABC} \stoptext It substitutes alright, but the "C" is not included in the stream which ctxlua gives to TeX. How can I modify my lpeg pattern? Thanks, and all best Thomas
Thomas A. Schmitz wrote:
Hi all,
I'm working on my Greek module again and am trying to filter and massage the input via lpeg, but there's something I don't quite get. As a minimal example: suppose I want to substitute A and B in my input with X and leave all other letters alone. Here's my attempt:
brrr ... massaging input ... can be dangerous ... anyhow, here you go \startluacode local replace = { A = "X", B = "X", } setmetatable(replace, { __index = function(t,k) return k end }) local dosub = (lpeg.Cs(1)) / replace local subs = (dosub)^0 function test (string) tex.sprint(lpeg.match(subs,string)) end \stopluacode \def\Substitute#1{\ctxlua{test("#1")}} \starttext \Substitute{thomas ABC whatever} \stoptext and yes, it's slow; the next variant is faster but takes a bit more memory (neglectable compared to what is alrwady taken) setmetatable(replace, { __index = function(t,k) t[k] = k return k end }) ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Aug 11, 2009, at 12:59 PM, Hans Hagen wrote:
Thomas A. Schmitz wrote:
Hi all, I'm working on my Greek module again and am trying to filter and massage the input via lpeg, but there's something I don't quite get. As a minimal example: suppose I want to substitute A and B in my input with X and leave all other letters alone. Here's my attempt:
brrr ... massaging input ... can be dangerous ... anyhow, here you go
Thanks Hans! I know it's not a good thing, but I do want to find a method to support ASCII transliteration in mkiv. I have learnt lots of interesting things about fea files in the past, the most important being that they are not the way to go (something that Taco had told me very early in my attempts; I should have listened to him...) So now I try to transform the input via lpeg. It's just a stopgap, but maybe better than nothing. Thanks, all best Thomas
Thomas A. Schmitz wrote:
On Aug 11, 2009, at 12:59 PM, Hans Hagen wrote:
Thomas A. Schmitz wrote:
Hi all, I'm working on my Greek module again and am trying to filter and massage the input via lpeg, but there's something I don't quite get. As a minimal example: suppose I want to substitute A and B in my input with X and leave all other letters alone. Here's my attempt:
brrr ... massaging input ... can be dangerous ... anyhow, here you go
Thanks Hans! I know it's not a good thing, but I do want to find a method to support ASCII transliteration in mkiv. I have learnt lots of interesting things about fea files in the past, the most important being that they are not the way to go (something that Taco had told me very early in my attempts; I should have listened to him...) So now I try to transform the input via lpeg. It's just a stopgap, but maybe better than nothing.
what exactly do you want to replace ? Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Aug 11, 2009, at 2:35 PM, Hans Hagen wrote:
what exactly do you want to replace ?
Hans
I'm trying to use the lpegs you have written for mtx-babel.lua, but instead of rewriting the greek ASCII stuff to a new file, I want to convert it to proper utf Greek and feed that to mkiv. As I said, it's a stopgap, but better than nothing... Thomas
Thomas A. Schmitz wrote:
On Aug 11, 2009, at 2:35 PM, Hans Hagen wrote:
what exactly do you want to replace ?
Hans
I'm trying to use the lpegs you have written for mtx-babel.lua, but instead of rewriting the greek ASCII stuff to a new file, I want to convert it to proper utf Greek and feed that to mkiv. As I said, it's a stopgap, but better than nothing...
if there is more demand for that i can consider making a substituter that operates on the node list in an early stage; that way it is controlled by attributes and there is no interference with macro definitions, reading modules and such Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hans Hagen wrote:
Thomas A. Schmitz wrote:
On Aug 11, 2009, at 2:35 PM, Hans Hagen wrote:
what exactly do you want to replace ?
Hans
I'm trying to use the lpegs you have written for mtx-babel.lua, but instead of rewriting the greek ASCII stuff to a new file, I want to convert it to proper utf Greek and feed that to mkiv. As I said, it's a stopgap, but better than nothing...
if there is more demand for that i can consider making a substituter that operates on the node list in an early stage; that way it is controlled by attributes and there is no interference with macro definitions, reading modules and such
Macro interaction may be an issue, but I believe it is still better for transliterations to work on the actual input strings or on tokens. For example, you may want to run macros (like \delimitedtext) on the converted output. If I had to do this myself, I would probably work on token lists, even though it is quite a bit less convenient than strings. I remember we have talked about writing extended lpegs that work directly on token- and nodelists, that would perhaps be the nicest solution in the long run. Anyway, I am just thinking out loud. Best wishes, Taco
Taco Hoekwater wrote:
Hans Hagen wrote:
Thomas A. Schmitz wrote:
On Aug 11, 2009, at 2:35 PM, Hans Hagen wrote:
what exactly do you want to replace ?
Hans
I'm trying to use the lpegs you have written for mtx-babel.lua, but instead of rewriting the greek ASCII stuff to a new file, I want to convert it to proper utf Greek and feed that to mkiv. As I said, it's a stopgap, but better than nothing...
if there is more demand for that i can consider making a substituter that operates on the node list in an early stage; that way it is controlled by attributes and there is no interference with macro definitions, reading modules and such
Macro interaction may be an issue, but I believe it is still better for transliterations to work on the actual input strings or on tokens. For example, you may want to run macros (like \delimitedtext) on the converted output.
my main concern with that is that one then needs to control precisely where to apply such translations; for instance turning a< into something else might also mess up math and adding all kind of extra checking and housekeeping (for instance when loading modules or whatever in the middle of such a conversion) of course when the to be converted fragments are tagged it's trivial to use lpeg and avoid \cs's btw, i think that delimitedtext would work anyway as we only replace "glyph a glyph<" by something else then anyway, it all depends on the task and hopefully unicode will solve all our problems (and not introduce more) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Aug 12, 2009, at 10:32 AM, Hans Hagen wrote:
if there is more demand for that i can consider making a substituter that operates on the node list in an early stage; that way it is controlled by attributes and there is no interference with macro definitions, reading modules and such Macro interaction may be an issue, but I believe it is still better for transliterations to work on the actual input strings or on tokens. For example, you may want to run macros (like \delimitedtext) on the converted output.
my main concern with that is that one then needs to control precisely where to apply such translations; for instance turning a< into something else might also mess up math and adding all kind of extra checking and housekeeping (for instance when loading modules or whatever in the middle of such a conversion)
of course when the to be converted fragments are tagged it's trivial to use lpeg and avoid \cs's
btw, i think that delimitedtext would work anyway as we only replace "glyph a glyph<" by something else then
anyway, it all depends on the task and hopefully unicode will solve all our problems (and not introduce more)
Well, in my case the fragments are already delimited, so it's relatively easy. However, I wonder whether there are many applications for this. I don't see too many, but maybe I'm wrong. From my POV, there is no need for this in the core, but maybe others see more usage. Thomas
participants (3)
-
Hans Hagen
-
Taco Hoekwater
-
Thomas A. Schmitz