On 2015-02-01, at 22:06, Jörg Weger
Is the character count “wc --char <textfile>” returns with or without blank spaces? (Which is important for me.) “man wc” doesn’t talk about that.
I had hoped there was a better way than to edit the result of “pdftotext” in my text editor or in libreoffice writer (deleting unnecessary carriage returns and spaces by searching for regular expressions) which are able to do the count I need. In fact I had hoped that ConTeXt was able to count the characters and spaces it renders to PDF (is that theoretically possible?) …
I am pretty sure that you can make sed filter out blank characters. So then you can just chain pdftotext, sed and wc. OTOH, here's a relevant question (and a simple answer) on SO. (It seems to count newlines, though.) JFF, I've just coded this in Emacs Lisp: --8<---------------cut here---------------start------------->8--- ;; Count non-blank characters in a buffer (defun how-many-visible-chars () "Count visible (i.e., other than spaces, tabs and newlines) characters in the buffer." (interactive) (let ((count 0)) (save-excursion (goto-char (point-min)) (while (not (eobp)) (unless (looking-at-p "[ \t\n]") (setq count (1+ count))) (forward-char))) (message "%d visible characters" count))) --8<---------------cut here---------------end--------------->8--- It's terribly unoptimized, but I ran it on a 300+ kB file on my low-end netbook and it ran in something like 2 seconds, so it's not that bad in practice. Also, it's not well-coded: it should e.g. return the number instead of displaying the message when called non-interactively, it might take active region into account etc. - but as a proof-of-concept, it works surprisingly well (i.e., fast).
Greetings Jörg
Best, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University