[NTG-context] issue with scite module

Max Chernoff mseven at telus.net
Wed Jun 1 23:58:13 CEST 2022


> Now, I still don’t understand LPEG and don’t know if there’s a general
> “character” class that doesn’t need a list...

Well looking through the XML spec

     https://www.w3.org/TR/REC-xml/#NT-NameChar

you'd think that we'd want a pattern like this:

     local name = (R("az","AZ","09", "\u{C0}\u{D6}", "\u{D8}\u{F6}", "\u{F8}\u{2FF}", "\u{370}\u{37D}", "\u{37F}\u{1FFF}", "\u{200C}\u{200D}", "\u{2070}\u{218F}", "\u{2C00}\u{2FEF}", "\u{3001}\u{D7FF}", "\u{F900}\u{FDCF}", "\u{FDF0}\u{FFFD}", "\u{10000}\u{EFFFF}", "\u{0300}\u{036F}", "\u{203F}\u{2040}") + S("_-.\u{B7}"))^1

But that doesn't work, since

> The same is true for lpeg.R, although the latter will display an error message if used
> with multibyte characters. Therefore lpeg.R('aä') results in the message bad argument #1
> to 'R' (range must have two characters), since to lpeg, ä is two ’characters’ (bytes), so
> aä totals three. (https://texdoc.org/serve/luatex/0##680)

The easiest way that I found was to just cheat and use everything with
a TeX catcode 11 ("letters"):

     local name = (R("az","AZ","09") + S("_-.") + lpeg.utfchartabletopattern(characters.csletters))^1

This isn't strictly speaking correct, but I think that it's close
enough. It seems to work correctly for Pablo's initial example,
but it may break something else.

-- Max

diff --git a/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.original b/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.lua
index e635d40..97de3fd 100644
--- a/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.original
+++ b/texmf-context/context/data/scite/context/lexers/scite-context-lexer-xml.lua
@@ -41,7 +41,7 @@ local semicolon        = P(";")
  local equal            = P("=")
  local ampersand        = P("&")
  
-local name             = (R("az","AZ","09") + S("_-."))^1
+local name             = (R("az","AZ","09") + S("_-.") + lpeg.utfchartabletopattern(characters.csletters))^1
  local openbegin        = P("<")
  local openend          = P("</")
  local closebegin       = P("/>") + P(">")






More information about the ntg-context mailing list