Mojca Miklavec wrote:
Hello,
I would like to ask how difficult it would be to count the number of words in a TeX/ConTeXt document. If it's too complex, please ignore the rest of the message.
the way i do such things (and worse trickery) is using pdftotext you can of course use tex, but then ther ecan be generated words and so and it is insane to use tex (or adapt a tex style) for that; it may help to run with (nondestructive) \setupalign[nothyphenated] anyhow, here is a script (i could not locate my normal one) === wordcount.rb === if (file = ARGV[0]) && file && FileTest.file?(file) then begin system("pdftotext #{ARGV[0]} wc.log") data = IO.read("wc.log") data.gsub!(/\d[\.\:]*\w+/o) do ' ' end # remove suffixes data.gsub!(/\d/o) do ' ' end # remove numbers data.gsub!(/\-\s+/mo) do ' ' end # remove hyphenation data.gsub!(/\-/mo) do ' ' end # split compound words data.gsub!(/[\.\,\<\>\/\?\\\|\'\"\;\:\]\{\}\{\+\=\-\_\)\(\*\&\^\%\$\#\@\!\~\`]/mo) do ' ' end words = data.split(/\s+/) count = Hash.new words.each do |w| count[w] = (count[w] || 0) + 1 end rescue puts("some error #{$!}") else puts("words : #{words.size}") puts("unique : #{count.size}") end if ARGV[1] =~ /list/ then puts("\n") count.sort.each do |k,v| puts("#{k} : #{v}") end end end usage: wc filename.pdf [list] it this kind of stuff is usefull, we can add it to one of the scripts that come with context Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------