Zipf's Law and Random Texts

Random-text models have been proposed as an explanation for the power law relationship between word frequency and rank, the so-called Zipf's law. They are generally regarded as null hypotheses rather than models in the strict sense. In this context, recent theories of language emergence and evolution assume this law as a priori information with no need of explanation. Here, random texts and real texts are compared through (a) the so-called lexical spectrum and (b) the distribution of words having the same length. It is shown that real texts fill the lexical spectrum much more efficiently and regardless of the word length, suggesting that the meaningfulness of Zipf's law is high.

[1]  Martin A. Nowak,et al.  The evolution of syntactic communication , 2000, Nature.

[2]  S. Naranan,et al.  Quantitative Linguistics and Complex System Studies , 1996, J. Quant. Linguistics.

[3]  S. Naranan,et al.  Information theoretic models in statistical linguistics. II: Word frequencies and hierarchical structure in language-statistical tests , 1992 .

[4]  Wentian Li,et al.  Random texts exhibit Zipf's-law-like word frequency distribution , 1992, IEEE Trans. Inf. Theory.

[5]  S. Naranan,et al.  STATISTICAL LAWS IN INFORMATION SCIENCE, LANGUAGE AND SYTEM OF NATURAL NUMBERS : SOME STRIKING SIMILARITIES , 1992 .

[6]  S. Naranan,et al.  Models for Power Law Relations in Linguistics and Information Science , 1998, J. Quant. Linguistics.

[7]  S. Naranan,et al.  Information theoretic models in statistical linguistics. I: A model for word frequencies , 1992 .

[8]  George Kingsley Zipf,et al.  Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology , 2012 .

[9]  M A Nowak,et al.  The basic reproductive ratio of a word, the maximum size of a lexicon. , 2000, Journal of theoretical biology.

[10]  Gabriel Altmann,et al.  Towards a Theory of Word Length Distribution , 1994, J. Quant. Linguistics.

[11]  Juhan Tuldava,et al.  The Frequency Spectrum of Text and Vocabulary , 1996, J. Quant. Linguistics.

[12]  Yuen Ren Chao,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[13]  G. A. Miller,et al.  Finitary models of language users , 1963 .

[14]  G. Miller,et al.  Some effects of intermittent silence. , 1957, The American journal of psychology.

[15]  Ricard V. Solé,et al.  Two Regimes in the Frequency of Words and the Origins of Complex Lexicons: Zipf’s Law Revisited* , 2001, J. Quant. Linguistics.

[16]  Rosario N. Mantegna,et al.  Numerical Analysis of Word Frequencies in Artificial and Natural Language Texts , 1997 .