Now, I still don’t understand LPEG and don’t know if there’s a general “character” class that doesn’t need a list...
Well looking through the XML spec https://www.w3.org/TR/REC-xml/#NT-NameChar you'd think that we'd want a pattern like this: local name = (R("az","AZ","09", "\u{C0}\u{D6}", "\u{D8}\u{F6}", "\u{F8}\u{2FF}", "\u{370}\u{37D}", "\u{37F}\u{1FFF}", "\u{200C}\u{200D}", "\u{2070}\u{218F}", "\u{2C00}\u{2FEF}", "\u{3001}\u{D7FF}", "\u{F900}\u{FDCF}", "\u{FDF0}\u{FFFD}", "\u{10000}\u{EFFFF}", "\u{0300}\u{036F}", "\u{203F}\u{2040}") + S("_-.\u{B7}"))^1 But that doesn't work, since
The same is true for lpeg.R, although the latter will display an error message if used with multibyte characters. Therefore lpeg.R('aä') results in the message bad argument #1 to 'R' (range must have two characters), since to lpeg, ä is two ’characters’ (bytes), so aä totals three. (https://texdoc.org/serve/luatex/0##680)
The easiest way that I found was to just cheat and use everything with a TeX catcode 11 ("letters"): local name = (R("az","AZ","09") + S("_-.") + lpeg.utfchartabletopattern(characters.csletters))^1 This isn't strictly speaking correct, but I think that it's close enough. It seems to work correctly for Pablo's initial example, but it may break something else. -- Max diff --git a/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.original b/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.lua index e635d40..97de3fd 100644 --- a/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.original +++ b/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.lua @@ -41,7 +41,7 @@ local semicolon = P(";") local equal = P("=") local ampersand = P("&") -local name = (R("az","AZ","09") + S("_-."))^1 +local name = (R("az","AZ","09") + S("_-.") + lpeg.utfchartabletopattern(characters.csletters))^1 local openbegin = P("<") local openend = P("") local closebegin = P("/>") + P(">")