On Mon, 2 Feb 2015 10:20:15 +0100
Keith Schultz
Hello All,
As a linguist, I can say that not counting words that are shorter is an absolute NO-GO for an accurate word count and thereby character count!
See below, for a non representative proof !
Am 01.02.2015 um 22:12 schrieb Wolfgang Schuster
: [snip, snip]
ConTeXt has an option to count the words (you find the result in <jobname>.words) in a document but words words shorter than four letters aren’t taken into account. word length under 4 characters : 10 word length =< 4 chars : 20
here you are missing a third of the words! That is 30%
regards Keith
See also: Zipf, G. K. (1949), "Human Behavior and the Principle of Least Effort", Cambridge, MA: Addison-Wesley. in particular, Chapter 2: On the Economy of Words. As well as: Shannon, C. E. (1951), "The redundancy of English", Cybernetics, 248-272. 54% for English, so we can afford to be sloppy (wch s wy txt compr qte ll). Alan