# [NTG-context] counting the words in a TeX document

Mojca Miklavec mojca.miklavec.lists at gmail.com
Mon Aug 7 10:24:32 CEST 2006

On 8/6/06, Aditya Mahajan wrote:
> On Sun, 6 Aug 2006, Mojca Miklavec wrote:
> > Base on those three answers I got a more clear idea of two (different,
> > but complementary) methods that might be sensible:
> >
> > a) ctxtools --wordcount filename[tex|pdf]
> > to do the wordcount for the whole document using pdftotext + ruby regexp
> >
> > b)
> > \usemodule[wordcount]
> >
> > whatever
> >
> > \startstatistics[name][words|letters|lines]
> > some more-or-less plain text
> > \stopstatistics
> >
> > whatever
> >
> > and according to Aditya's idea, run a (ruby) regular expression
> > (insead of detex) on it which would write the nicely formatted desired
> > number to the output/log file. (I don't know if it's possible to use
> > the first approach for the second problem, but it doesn't make sense
> > to complicate things too much.)
>
> If you have a script that counts words in a Context document, the
> second approach is straight forward. Write everything to a buffer and
> run the script on the buffer. However, such a mechansim will never be
> perfect (or close to perfect) in the sense of parsing arbitrary input.

The most dummy solution that I could think of (using slightly modified
Hans's ruby script):

\unprotect

\def\startstatistics
{\dodoubleempty\dostartstatistics}

\def\dostartstatistics[#1][#2]#3\stopstatistics
{\setbuffer[#1]#3\endbuffer
\executesystemcommand{ruby wordcount.rb \jobname-#1.tmp}%
\getbuffer[#1]}

\protect \doifnotmode{demo}{\endinput}

... but a friend who asked me for a favour actually wants to use
abbreviations and bibliography as well, so only the first method (to
create PDF first) would work. He currently keeps copy-pasting the
resulting PDF to Word and uses Word's statistics to cound the words
and/or characters for him.

But I guess that his wishes will have to wait for some more time in this case.

> ftp://tug.ctan.org/pub/tex-archive/macros/plain/contrib/misc/xii.tex
>
> But of course, you will not write anything like this in an abstract
> :-)

Nevertheless, I love the story (and esp. the document which creates it)!

All the best,
Mojca