[NTG-context] PDF document statistics (character count incl. spaces)?
alan.braslau at cea.fr
Mon Feb 2 16:39:47 CET 2015
On Mon, 2 Feb 2015 10:20:15 +0100
Keith Schultz <keithjschultz at icloud.com> wrote:
> Hello All,
> As a linguist, I can say that not counting words that are shorter is
> an absolute NO-GO for an accurate word count and thereby character
> See below, for a non representative proof !
> > Am 01.02.2015 um 22:12 schrieb Wolfgang Schuster
> > <schuster.wolfgang at gmail.com>:
> [snip, snip]
> > ConTeXt has an option to count the words (you find the result in
> > <jobname>.words) in a document but words words shorter than four
> > letters aren’t taken into account.
> word length under 4 characters : 10
> word length =< 4 chars : 20
> here you are missing a third of the words! That is 30%
Zipf, G. K. (1949), "Human Behavior and the Principle of Least Effort",
Cambridge, MA: Addison-Wesley.
in particular, Chapter 2: On the Economy of Words.
As well as:
Shannon, C. E. (1951), "The redundancy of English", Cybernetics,
54% for English, so we can afford to be sloppy (wch s wy txt compr qte
More information about the ntg-context