rfc4180splitter not handling UTF-8 with BOM files
Hi! I run into a problem reading in certain CSV files. I nailed it down to the following example: \starttext \startluacode local mycsvsplitter = utilities.parsers.rfc4180splitter{ separator = ",", quote = '"'} -- fails with -- token call, execute: [ctxlua]:11: attempt to index a nil value (local 'tablerows') -- local mycsv = io.loaddata("A.csv") -- works local mycsv = io.loaddata("B.csv") local tablerows = mycsvsplitter(mycsv) context(tablerows[1][1]) context(" ") context(tablerows[1][2]) \stopluacode \stoptext The compilation fails with token call, execute: [ctxlua]:11: attempt to index a nil value (local 'tablerows') The two files are attached. The only difference is that: A.csv: Unicode text, UTF-8 (with BOM) text B.csv: ASCII text Somehow the rfc4180splitter chokes on UTF-8 with BOM files. io.loaddata succeeds as far as I can tell. Is there a way to read in those files without pre-processing them? Marco version: 2024.09.17 13:15
On 11/23/2024 8:40 PM, Marco Patzer wrote:
Somehow the rfc4180splitter chokes on UTF-8 with BOM files. io.loaddata succeeds as far as I can tell. Is there a way to read in those files without pre-processing them?
I'll send you a patch to test. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl -----------------------------------------------------------------
participants (2)
-
Hans Hagen
-
Marco Patzer