Zipf's law in 50 languages: its structural pattern, linguistic interpretation, and cognitive motivation

Zipf's law has been found in many human-related fields, including language, where the frequency of a word is persistently found as a power law function of its frequency rank, known as Zipf's law. However, there is much dispute whether it is a universal law or a statistical artifact, and little is known about what mechanisms may have shaped it. To answer these questions, this study conducted a large scale cross language investigation into Zipf's law. The statistical results show that Zipf's laws in 50 languages all share a 3-segment structural pattern, with each segment demonstrating distinctive linguistic properties and the lower segment invariably bending downwards to deviate from theoretical expectation. This finding indicates that this deviation is a fundamental and universal feature of word frequency distributions in natural languages, not the statistical error of low frequency words. A computer simulation based on the dual-process theory yields Zipf's law with the same structural pattern, suggesting that Zipf's law of natural languages are motivated by common cognitive mechanisms. These results show that Zipf's law in languages is motivated by cognitive mechanisms like dual-processing that govern human verbal behaviors.

[1]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[2]  D. Howes On the Relation between the Intelligibility and Frequency of Occurrence of English Words , 1957 .

[3]  Jonathan Evans In two minds: dual-process accounts of reasoning , 2003, Trends in Cognitive Sciences.

[4]  Ricard V. Solé,et al.  Least effort and the origins of scaling in human language , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Sebastian Bernhardsson,et al.  Zipf's law unzipped , 2011, ArXiv.

[6]  Lada A. Adamic,et al.  Evolutionary Dynamics of the World Wide Web , 1999 .

[7]  Michael Mitzenmacher,et al.  Power laws for monkeys typing randomly: the case of unequal probabilities , 2004, IEEE Transactions on Information Theory.

[8]  G. Yule,et al.  The statistical study of literary vocabulary , 1944 .

[9]  B. Corominas-Murtra,et al.  Universality of Zipf's law. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  S. Piantadosi Zipf’s word frequency law in natural language: A critical review and future directions , 2014, Psychonomic Bulletin & Review.

[11]  H. Simon,et al.  ON A CLASS OF SKEW DISTRIBUTION FUNCTIONS , 1955 .

[12]  Bradley conducld,et al.  THE WORD FREQUENCY EFFECT AND LEXICAL ACCESS* , 2002 .

[13]  Jonathan Evans,et al.  Science Perspectives on Psychological , 2022 .

[14]  L. Reder,et al.  A mechanistic account of the mirror effect for word frequency: a computational model of remember-know judgments in a continuous recognition paradigm. , 2000, Journal of experimental psychology. Learning, memory, and cognition.

[15]  D. Kahneman A perspective on judgment and choice: mapping bounded rationality. , 2003, The American psychologist.

[16]  Xavier Gabaix,et al.  Power Laws in Economics: An Introduction , 2016 .

[17]  Wentian Li,et al.  Random texts exhibit Zipf's-law-like word frequency distribution , 1992, IEEE Trans. Inf. Theory.

[18]  Dmitrii Y. Manin,et al.  Zipf's Law and Avoidance of Excessive Synonymy , 2007, Cogn. Sci..

[19]  A. F. Parker-Rhodes,et al.  A Theory of Word-Frequency Distribution , 1956, Nature.

[20]  G. Zipf The Psycho-Biology Of Language: AN INTRODUCTION TO DYNAMIC PHILOLOGY , 1999 .

[21]  D. Yu. Manin,et al.  Mandelbrot's Model for Zipf's Law: Can Mandelbrot's Model Explain Zipf's Law for Language? , 2009, J. Quant. Linguistics.

[22]  L. F. Barrett,et al.  Individual differences in working memory capacity and dual-process theories of the mind. , 2004, Psychological bulletin.

[23]  S. Havlin,et al.  Scaling laws of human interaction activity , 2009, Proceedings of the National Academy of Sciences.

[24]  D. Balota,et al.  The locus of word-frequency effects in the pronunciation task: Lexical access and/or production? ☆ , 1985 .

[25]  G. Miller,et al.  Some effects of intermittent silence. , 1957, The American journal of psychology.

[26]  Ramon Ferrer-i-Cancho,et al.  Random Texts Do Not Exhibit the Real Zipf's Law-Like Rank Distribution , 2010, PloS one.

[27]  Ye-Sho Chen,et al.  Mathematical modeling of empirical laws in computer applications: A case study , 1992 .

[28]  K. Stanovich,et al.  Defining features versus incidental correlates of Type 1 and Type 2 processing , 2012 .

[29]  Joshua Correll,et al.  1/f noise and effort on implicit measures of bias. , 2008, Journal of personality and social psychology.

[30]  G D Brown,et al.  Word-frequency effects on short-term memory tasks: evidence for a redintegration process in immediate serial recall. , 1997, Journal of experimental psychology. Learning, memory, and cognition.

[31]  Juan Segui,et al.  The word frequency effect and lexical access , 1982, Neuropsychologia.

[32]  K. Forster,et al.  Lexical Access and Naming Time. , 1973 .

[33]  Eduardo G. Altmann,et al.  Stochastic model for the vocabulary growth in natural languages , 2012, ArXiv.

[34]  D L Gilden,et al.  1/f noise in human cognition. , 1995, Science.

[35]  Lynne M Reder,et al.  The low-frequency encoding disadvantage: Word frequency affects processing demands. , 2006, Journal of experimental psychology. Learning, memory, and cognition.

[36]  Diana Van Lancker Sidtis,et al.  When novel sentences spoken or heard for the first time in the history of the universe are not enough: toward a dual-process model of language. , 2004, International journal of language & communication disorders.

[37]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[38]  H. Rubenstein,et al.  Test of Response Bias Explanation of Word-Frequency Effect , 1961, Science.

[39]  S. Frank The common patterns of nature , 2009, Journal of evolutionary biology.

[40]  C. Sparrow The Fractal Geometry of Nature , 1984 .

[41]  Lahomtoires d'Electronique AN INFORMATIONAL THEORY OF THE STATISTICAL STRUCTURE OF LANGUAGE 36 , 2010 .

[42]  Siddharth Suri,et al.  Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.