rfc4180splitter not handling UTF-8 with BOM files

23 Nov 2024

      Hi!

I run into a problem reading in certain CSV files. I nailed it down
to the following example:

\starttext
\startluacode
  local mycsvsplitter = utilities.parsers.rfc4180splitter{
    separator = ",",
    quote = '"'}

  -- fails with
  -- token call, execute: [ctxlua]:11: attempt to index a nil value (local 'tablerows')
  -- local mycsv = io.loaddata("A.csv")

  -- works
  local mycsv = io.loaddata("B.csv")

  local tablerows = mycsvsplitter(mycsv)
  context(tablerows[1][1])
  context(" ")
  context(tablerows[1][2])
\stopluacode
\stoptext

The compilation fails with

  token call, execute: [ctxlua]:11: attempt to index a nil value (local 'tablerows')

The two files are attached. The only difference is that:

  A.csv: Unicode text, UTF-8 (with BOM) text
  B.csv: ASCII text

Somehow the rfc4180splitter chokes on UTF-8 with BOM files.
io.loaddata succeeds as far as I can tell. Is there a way to read in
those files without pre-processing them?

Marco

version: 2024.09.17 13:15

Marco Patzer

Hans Hagen

tags

participants (2)