Length-Frequency Statistics for Written English

The results of a tabulation of word frequencies in a sample of written English are analyzed in terms of word length and syntactic function. It is found that a simple stochastic model gives a rough prediction for the results obtained when all words are combined, but not when words are classified as function or content words. Function words are short and their frequency of occurrence is a decreasing function of their length; content words are longer and their probability is relatively independent of length.