CSV scanners built in ConTeXt - feature or bug?
Hi ConTeXist A few days ago Hans pointed me builtin CSV splitter. I tried to test it sure will come in handy for my needs. I found that if the CSV file contain a blank line, then it stops processing the file (see my minimal example). It is clear to me that the incorrectness of file format (eg. different number of columns in rows etc.) may cause interruption of processing, however, I want to ask whether there is an opportunity to process the CSV file with blank lines until the end of CSV file. I noticed that when I exporting data from Excel sometimes happens that in the export file will appear blank line. Is it interrupt processing a feature of a buildin splitter or is it a bug? Can it possibly somehow fix or add new functionality? Thanx Jaroslav Hajtmar Here is minimal example: \starttext \startluacode local mycsvsplitter = utilities.parsers.rfc4180splitter{ separator = ",", quote = '"', } local crap = io.loaddata("data.txt") -- with header variant local tablerows, columnname = mycsvsplitter(crap,true) inspect(tablerows) inspect(columnname) -- without header variant -- local tablerows = mycsvsplitter(crap) -- inspect(tablerows) for i=1,#tablerows do local l = tablerows[i] for j=1,#l do context(l[j]..", ") end context('\\crlf') end \stopluacode \stoptext % <-------------- here start data.txt file ----------------------> first,second,third,fourth 1,"2","3","4" "a","b","c","d" "foo","bar""baz","boogie","xyzzy" " "," "," "," " "And now","followed by","several","blank lines" "After several","empty rows","data continues","here" 11,"22","33","44" "aa","bb","cc","dd" % <-------------- and here stop data.txt file ---------------------->
On 2/26/2015 1:40 AM, Jaroslav Hajtmar wrote:
\starttext
\startluacode local mycsvsplitter = utilities.parsers.rfc4180splitter{ separator = ",", quote = '"', }
local crap = io.loaddata("data.txt")
-- with header variant local tablerows, columnname = mycsvsplitter(crap,true) inspect(tablerows) inspect(columnname)
-- without header variant -- local tablerows = mycsvsplitter(crap) -- inspect(tablerows)
for i=1,#tablerows do local l = tablerows[i] for j=1,#l do context(l[j]..", ") end context('\\crlf') end
\stopluacode
\stoptext
line 527 in util-prs.lua: local wholeblob = Ct((newline^(specification.strict and -1 or 1) * record)^0) should do the trick i'm not sure about the default as the standard might demand quit at empty line so that needs to be figured out (not by me therefore by you) \starttext \startluacode local crap = [[ 1,"2","3","4" "a","b","c","d" "foo","bar""baz","boogie","xyzzy" " "," "," "," " "And now","followed by","several","blank lines" 1,"2","3","4" "a","b","c","d" "foo","bar""baz","boogie","xyzzy" " "," "," "," " ]] local mycsvsplitter = { utilities.parsers.rfc4180splitter{ separator = ",", quote = '"', strict = true, }, utilities.parsers.rfc4180splitter{ separator = ",", quote = '"', } } for i=1,#mycsvsplitter do local tablerows, columnname = mycsvsplitter[i](crap,true) context.formatted.title("Case %s",i) for i=1,#tablerows do local l = tablerows[i] for j=1,#l do context(l[j]..", ") end context('\\crlf') end end \stopluacode \stoptext ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
On Thu, 26 Feb 2015 10:47:29 +0100
Hans Hagen
i'm not sure about the default as the standard might demand quit at empty line so that needs to be figured out (not by me therefore by you)
Stop on empty line is a very MetaPost-like feature. I don't have an opinion as to what should be expected for CSV, but MP thus allows one to pick-off sets of data by successive reads to one file. Alan
Hans and Alan, Thanks for the reply. Now it works properly. I would like to ask if you're planning to fix file util-prs.lua in a future release of standalone ConTeXt. As for me, I'd rather vote for as the default option other option, ie which does not stop on a blank line ie. not how it's setup now (ie personally I'd rather vote strict=true would mean process all lines of CSV file and strict=false mean stop processing on blank line), but I will take into account whatever alternative and consequently I would take into account this options for my own library. Alan writes about this behavior as like metapost feature. Personally, I think that the CSV file is basically a plain text file and a blank line in it has its place. The end of the text file is usually marked by <eof> character, so I guess there's no reason to terminate processing before the file really ends. Jaroslav Hajtmar Dne 26.2.2015 v 12:20 Alan BRASLAU napsal(a):
On Thu, 26 Feb 2015 10:47:29 +0100 Hans Hagen
wrote: i'm not sure about the default as the standard might demand quit at empty line so that needs to be figured out (not by me therefore by you) Stop on empty line is a very MetaPost-like feature.
I don't have an opinion as to what should be expected for CSV, but MP thus allows one to pick-off sets of data by successive reads to one file.
Alan ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________
participants (3)
-
Alan BRASLAU
-
Hans Hagen
-
Jaroslav Hajtmar