Hi all, this is a bit OT and should probably go to a lua list, but since some people here are very proficient in lua and I feel less embarrassed about noob questions here... I have a half-functioning python script to convert entries from a classics database into the bibtex format. I want to rewrite it in lua and make it more functional. Three little problems/questions: 1. I found a script to convert Roman numerals via lpeg here: http://lua-users.org/wiki/LpegRecipes but it uses the syntax lpeg.Ca which my lpeg doesn't recognize and which I can't find in the lpeg manual. According to a talk by Roberto Ierusalimschy, "lpeg.Ca(patt) - "accumulates" the nested captures." (http://www.inf.puc-rio.br/~roberto/lpeg/slides-lpeg-workshop2008.pdf ) Is this obsolete, has it been replaced by anything? 2. How can I check if a string begins with a class of words "(Der |Die |Das |The |An )" etc. and strip these words from the string? I do it with a compiled regexp in python, but "Programming in lua" has this to say: "Unlike some other systems, in Lua a modifier can only be applied to a character class; there is no way to group patterns under a modifier. For instance, there is no pattern that matches an optional word (unless the word has only one letter). Usually you can circumvent this limitation using some of the advanced techniques that we will see later." I haven't found these techniques yet. 3. How can I compare strings with utf8 characters? My naive approach if string.find(record, "Résumé") doesn't appear to work (while the same method does work if the string has only ASCII characters). Sorry if this is OT, and I'll be grateful for any pointers. Thomas
Thomas A. Schmitz wrote:
Hi all,
this is a bit OT and should probably go to a lua list, but since some people here are very proficient in lua and I feel less embarrassed about noob questions here... I have a half-functioning python script to convert entries from a classics database into the bibtex format. I want to rewrite it in lua and make it more functional. Three little problems/questions:
1. I found a script to convert Roman numerals via lpeg here: http://lua-users.org/wiki/LpegRecipes but it uses the syntax lpeg.Ca which my lpeg doesn't recognize and which I can't find in the lpeg manual. According to a talk by Roberto Ierusalimschy, "lpeg.Ca(patt) - "accumulates" the nested captures." (http://www.inf.puc-rio.br/~roberto/lpeg/slides-lpeg-workshop2008.pdf) Is this obsolete, has it been replaced by anything?
here is a variant that implements a function (and does not use the env trick) do local add = function (x,y) return x+y end local P,Ca,Cc= lpeg.P,lpeg.Ca,lpeg.Cc local symbols = { I=1,V=5,X=10,L=50,C=100,D=500,M=1000,IV=4,IX=9,XL=40,CD=400,CM=900} local adders = { } for s,n in pairs(symbols) do adders[s] = P(s)*Cc(n)/add end local MS = adders.M^0 local CS = (adders.D*adders.C^(-4)+adders.CD+adders.CM+adders.C^(-4))^(-1) local XS = (adders.L*adders.X^(-4)+adders.XL+adders.X^(-4))^(-1) local IS = (adders.V*adders.I^(-4)+adders.IX+adders.IV+adders.I^(-4))^(-1) local p = Ca(Cc(0)*MS*CS*XS*IS) function string:romantonumber() return p:match(self:upper()) end end print(string.romantonumber("MMIX")) print(string.romantonumber("MMIIIX")) just run such script using mtxrun --script yourscript.lua as luatex (texlua) has the latest lpeg built in)
2. How can I check if a string begins with a class of words "(Der |Die |Das |The |An )" etc. and strip these words from the string? I do it with a compiled regexp in python, but "Programming in lua" has this to say: "Unlike some other systems, in Lua a modifier can only be applied to a character class; there is no way to group patterns under a modifier. For instance, there is no pattern that matches an optional word (unless the word has only one letter). Usually you can circumvent this limitation using some of the advanced techniques that we will see later." I haven't found these techniques yet.
local stripped = { "Der", "Die", "Das" } local p = lpeg.P(false) for k, v in ipairs(stripped) do p = p + lpeg.P(v) end local w = p * " " local stripper = lpeg.Cs(((w/"") + lpeg.C(1))^0) lpeg.print(stripper) str = "Germans somehow always talk about Der Thomas and Der Hans" print(stripper:match(str))
3. How can I compare strings with utf8 characters? My naive approach if string.find(record, "Résumé") doesn't appear to work (while the same method does work if the string has only ASCII characters).
since lua is 8 bit clean utf should just work ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Jan 23, 2009, at 12:13 PM, Hans Hagen wrote:
here is a variant that implements a function (and does not use the env trick)
do local add = function (x,y) return x+y end local P,Ca,Cc= lpeg.P,lpeg.Ca,lpeg.Cc local symbols = { I=1,V=5,X=10,L=50,C=100,D=500,M=1000,IV=4,IX=9,XL=40,CD=400,CM=900} local adders = { } for s,n in pairs(symbols) do adders[s] = P(s)*Cc(n)/add end local MS = adders.M^0 local CS = (adders.D*adders.C^(-4)+adders.CD+adders.CM +adders.C^(-4))^(-1) local XS = (adders.L*adders.X^(-4)+adders.XL+adders.X^(-4))^(-1) local IS = (adders.V*adders.I^(-4)+adders.IX+adders.IV +adders.I^(-4))^(-1) local p = Ca(Cc(0)*MS*CS*XS*IS) function string:romantonumber() return p:match(self:upper()) end end
print(string.romantonumber("MMIX")) print(string.romantonumber("MMIIIX"))
just run such script using
mtxrun --script yourscript.lua
as luatex (texlua) has the latest lpeg built in)
Brilliant! This one does work when I use it with luatex (not with my system lua though, even though I have the latest released version of lpeg 0.9 installed. Bizarre...
2. How can I check if a string begins with a class of words "(Der | Die |Das |The |An )" etc. and strip these words from the string? I do it with a compiled regexp in python, but "Programming in lua" has this to say: "Unlike some other systems, in Lua a modifier can only be applied to a character class; there is no way to group patterns under a modifier. For instance, there is no pattern that matches an optional word (unless the word has only one letter). Usually you can circumvent this limitation using some of the advanced techniques that we will see later." I haven't found these techniques yet.
local stripped = { "Der", "Die", "Das" }
local p = lpeg.P(false)
for k, v in ipairs(stripped) do p = p + lpeg.P(v) end
local w = p * " "
local stripper = lpeg.Cs(((w/"") + lpeg.C(1))^0)
lpeg.print(stripper)
str = "Germans somehow always talk about Der Thomas and Der Hans"
print(stripper:match(str))
Brilliant again! I can run with that, looks great! And who doesn't want a "local stripper" in his code?
3. How can I compare strings with utf8 characters? My naive approach if string.find(record, "Résumé") doesn't appear to work (while the same method does work if the string has only ASCII characters).
since lua is 8 bit clean utf should just work
OK, then the problem must be somewhere else. I'll investigate. Thanks a lot, and best wishes Thomas
On Jan 23, 2009, at 12:13 PM, Hans Hagen wrote:
Thomas A. Schmitz wrote:
it uses the syntax lpeg.Ca which my lpeg doesn't recognize and which I can't find in the lpeg manual.
[useful information snipped]
just run such script using
mtxrun --script yourscript.lua
as luatex (texlua) has the latest lpeg built in)
Just one remark: my lpeg is /* ** $Id: lpeg.c,v 1.98 2008/10/11 20:20:43 roberto Exp $ and doesn't have the lpeg.Ca pattern. The lpeg that comes with luatex is /* ** $Id: lpeg.c,v 1.86 2008/03/07 17:20:19 roberto Exp $ so it's older, and it does have the lpeg.Ca pattern accumulator. And can I ask one more question about lpeg? Suppose I have the string "{\em This string is \quotation{heavily} emphasized.}" and want to transform that into something like "\color[red]{This string is \quotation{heavily} emphasized.}" How would I go about this using lpeg? I must use a lpeg.V somewhere, but I can't figure out where and how. Thanks, and all best Thomas
participants (3)
-
Hans Hagen
-
Taco Hoekwater
-
Thomas A. Schmitz