How many words are there?

The commonsensical assumption that any language has only finitely many words is shown to be false by a combination of formal and empirical arguments. Zipf's Law and related formulas are investigated and a more complex model is offered.

[1]  E. Khmaladze The statistical analysis of a large number of rare events , 1988 .

[2]  R. Fisher,et al.  The Relation Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Population , 1943 .

[3]  David M. W. Powers,et al.  Applications and Explanations of Zipf’s Law , 1998, CoNLL.

[4]  Herbert A. Simon,et al.  Some Further Notes on a Class of Skew Distribution Functions , 1960, Inf. Control..

[5]  Charles Gide,et al.  Cours d'économie politique , 1911 .

[6]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[7]  Gustav Herdan,et al.  The advanced theory of language as choice and chance , 1968 .

[8]  H. Simon,et al.  ON A CLASS OF SKEW DISTRIBUTION FUNCTIONS , 1955 .

[9]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[10]  J. Willis Age and Area , 1926, The Quarterly Review of Biology.

[11]  John Burrows,et al.  Word-Patterns and Story-Shapes: The Statistical Analysis of Narrative Style , 1987 .

[12]  Geoffrey Sampson,et al.  Word frequency distributions , 2002, Computational Linguistics.

[13]  Arthur Nádas,et al.  On Turing's formula for word probabilities , 1985, IEEE Trans. Acoust. Speech Signal Process..

[14]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[15]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[16]  Benoit B. Mandelbrot,et al.  Post Scriptum to "Final Note" , 1961, Inf. Control..

[17]  Christer Samuelsson Relating Turing's Formula and Zipf's Law , 1996, VLC@COLING.

[18]  G. Yule,et al.  A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[19]  J. Marchal Cours d'economie politique , 1950 .

[20]  David R. Cox,et al.  The Theory of Stochastic Processes , 1967, The Mathematical Gazette.

[21]  Marie Tesitelová I. Quantitative Linguistics , 1992 .

[22]  R. Harald Baayen,et al.  The Effects of Lexical Specialization on the Growth Curve of the Vocabulary , 1996, Comput. Linguistics.

[23]  Roman Jakobson,et al.  Structure of Language and Its Mathematical Aspects , 1961 .

[24]  R. Giffen The Distribution of Income , 1900, Nature.

[25]  D. Champernowne A Model of Income Distribution , 1953 .

[26]  B. Efron,et al.  Estimating the number of unseen species: How many words did Shakespeare know? Biometrika 63 , 1976 .

[27]  G. Zipf The Psycho-Biology Of Language: AN INTRODUCTION TO DYNAMIC PHILOLOGY , 1999 .

[28]  John B. Carroll,et al.  The American Heritage Word Frequency Book , 1971 .

[29]  Benoit B. Mandelbrot,et al.  Final Note on a Class of Skew Distribution Functions: Analysis and Critique of a Mode Due to H. A. Simon , 1961, Inf. Control..

[30]  Herbert A. Simon Reply to "Final Note" by Benoit Mandelbrot , 1961, Inf. Control..

[31]  R. Harald Baayen,et al.  How Variable May a Constant be? Measures of Lexical Richness in Perspective , 1998, Comput. Humanit..

[32]  James K. Galbraith,et al.  The Distribution of Income , 1998 .

[33]  B. Efron,et al.  Did Shakespeare write a newly-discovered poem? , 1987 .

[34]  Benoit B. Mandelbrot,et al.  A Note On a Class of Skew Distribution Functions: Analysis and Critique of a Paper by H. A. Simon , 1959, Inf. Control..

[35]  P. Fisk THE GRADUATION OF INCOME DISTRIBUTIONS , 1961 .

[36]  Herbert A. Simon,et al.  Reply to Dr. Mandelbrot's Post Scriptum , 1961, Inf. Control..