Zipf’s law outside the middle range

Zipf (1949) already noted that the linear relationship that he observed between log frequency and log rank is strongest in the middle range: both very high and very low frequency items tend to deviate from the log-log regression line. In this paper the causes for such deviations are investigated and a more detailed statistical model is offered. The subgeometric mean property of frequency counts is introduced and used in proving that the size of the vocabulary tends to infinity as sample size is increased without bounds.

[1]  J. Willis Age and Area , 1926, The Quarterly Review of Biology.

[2]  H. Simon,et al.  ON A CLASS OF SKEW DISTRIBUTION FUNCTIONS , 1955 .

[3]  Benoit B. Mandelbrot,et al.  Post Scriptum to "Final Note" , 1961, Inf. Control..

[4]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[5]  David M. W. Powers,et al.  Applications and Explanations of Zipf’s Law , 1998, CoNLL.

[6]  Herbert A. Simon,et al.  Some Further Notes on a Class of Skew Distribution Functions , 1960, Inf. Control..

[7]  J. Marchal Cours d'economie politique , 1950 .

[8]  Herbert A. Simon,et al.  Reply to Dr. Mandelbrot's Post Scriptum , 1961, Inf. Control..

[9]  Benoit B. Mandelbrot,et al.  A Note On a Class of Skew Distribution Functions: Analysis and Critique of a Paper by H. A. Simon , 1959, Inf. Control..

[10]  R. Harald Baayen,et al.  How Variable May a Constant be? Measures of Lexical Richness in Perspective , 1998, Comput. Humanit..

[11]  Herbert A. Simon Reply to "Final Note" by Benoit Mandelbrot , 1961, Inf. Control..

[12]  A. Folsom Reply to Dr , 2004 .

[13]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[14]  John Burrows,et al.  Word-Patterns and Story-Shapes: The Statistical Analysis of Narrative Style , 1987 .

[15]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[16]  Christer Samuelsson Relating Turing's Formula and Zipf's Law , 1996, VLC@COLING.