Co-occurrence of the Benford-like and Zipf Laws Arising from the Texts Representing Human and Artificial Languages

We demonstrate that large texts, representing human (English, Russian, Ukrainian) and artificial (C++, Java) languages, display quantitative patterns characterized by the Benford-like and Zipf laws. The frequency of a word following the Zipf law is inversely proportional to its rank, whereas the total numbers of a certain word appearing in the text generate the uneven Benford-like distribution of leading numbers. Excluding the most popular words essentially improves the correlation of actual textual data with the Zipfian distribution, whereas the Benford distribution of leading numbers (arising from the overall amount of a certain word) is insensitive to the same elimination procedure. The calculated values of the moduli of slopes of double logarithmical plots for artificial languages (C++, Java) are markedly larger than those for human ones.

[1]  J.F. Fontanari,et al.  Minimal models for text production and Zipf's law , 2005, International Conference on Integration of Knowledge Intensive Multi-Agent Systems, 2005..

[2]  R. Ferrer i Cancho,et al.  Zipf's law from a communicative phase transition , 2005 .

[3]  Ramon Ferrer-i-Cancho,et al.  Random Texts Do Not Exhibit the Real Zipf's Law-Like Rank Distribution , 2010, PloS one.

[4]  N. Ohtori,et al.  Revisiting the Benford law: When the Benford-like distribution of leading digits in sets of numerical data is expectable? , 2016 .

[5]  P. Blanchard,et al.  Scaling and universality in city space syntax: Between Zipf and Matthew , 2007, 0709.4375.

[6]  Sebastian Bernhardsson,et al.  Zipf's law unzipped , 2011, ArXiv.

[7]  Simon Kirby,et al.  Spontaneous evolution of linguistic structure-an iterated learning model of the emergence of regularity and irregularity , 2001, IEEE Trans. Evol. Comput..

[8]  Hongyu Zhang,et al.  Discovering power laws in computer programs , 2009, Inf. Process. Manag..

[9]  Steven J. Miller,et al.  Order Statistics and Benford's Law , 2008, Int. J. Math. Math. Sci..

[10]  Regina Pustet Zipf and his heirs , 2004 .

[11]  Benford’s law, its applicability and breakdown in the IR spectra of polymers , 2015, 1506.03046.

[12]  Ramon Ferrer-i-Cancho,et al.  Compression and the origins of Zipf's law for word frequencies , 2016, Complex..

[13]  Ricard V. Solé,et al.  Two Regimes in the Frequency of Words and the Origins of Complex Lexicons: Zipf’s Law Revisited* , 2001, J. Quant. Linguistics.

[14]  Joan L. Bybee,et al.  Frequency and the emergence of linguistic structure , 2001 .

[15]  Anastasios A. Tsonis,et al.  Zipf's law and the structure and evolution of languages , 1997, Complex..

[16]  G. Zipf The Psycho-Biology Of Language: AN INTRODUCTION TO DYNAMIC PHILOLOGY , 1999 .

[17]  Ricard V. Solé,et al.  Least effort and the origins of scaling in human language , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Keith A. Johnson Quantitative Methods In Linguistics , 2008 .

[19]  W. Reed The Pareto, Zipf and other power laws , 2001 .

[20]  Gil Stelzer,et al.  Elucidating tissue specific genes using the Benford distribution , 2016, BMC Genomics.

[21]  A. Mehri,et al.  Power-law regularities in human language , 2016 .

[22]  Edward Bormashenko,et al.  On the Universal Quantitative Pattern of the Distribution of Initial Characters in General Dictionaries: The Exponential Distribution is Valid for Various Languages , 2017, J. Quant. Linguistics.

[23]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[24]  Erez Lieberman Aiden,et al.  Uncharted: Big Data as a Lens on Human Culture , 2013 .

[25]  Zuntao Fu,et al.  Beyond Benford's Law: Distinguishing Noise from Chaos , 2015, PloS one.

[26]  T. Goldman,et al.  Ubiquity of Benford's law and emergence of the reciprocal distribution , 2016, 1604.07391.

[27]  Lucio Barabesi,et al.  Goodness-of-Fit Testing for the Newcomb-Benford Law With Application to the Detection of Customs Fraud , 2018 .

[28]  T. A. Mir,et al.  The law of the leading digits and the world religions , 2012 .

[29]  Anastasios A. Tsonis,et al.  Zipf's law and the structure and evolution of languages , 1997, Complex..

[30]  C. Lanczos The variational principles of mechanics , 1949 .

[31]  J. Fontanari,et al.  Solvable null model for the distribution of word frequencies. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  I. Eliazar Zipf law: an extreme perspective , 2016 .

[33]  J. Sutherland The Quark and the Jaguar , 1994 .

[34]  A. Robledo,et al.  Possible thermodynamic structure underlying the laws of Zipf and Benford , 2010, 1008.1614.

[35]  H E Stanley,et al.  Linguistic features of noncoding DNA sequences. , 1994, Physical review letters.

[36]  Simon Newcomb,et al.  Note on the Frequency of Use of the Different Digits in Natural Numbers , 1881 .

[37]  Intuitive Considerations Clarifying the Origin and Applicability of the Benford Law , 2015, 1510.07220.

[38]  Malcolm Sambridge,et al.  Benford's law in the natural sciences , 2010 .

[39]  D. Yu. Manin,et al.  Mandelbrot's Model for Zipf's Law: Can Mandelbrot's Model Explain Zipf's Law for Language? , 2009, J. Quant. Linguistics.

[40]  Michael Batty,et al.  There is More than a Power Law in Zipf , 2012, Scientific Reports.

[41]  Y. Manin Zipf’s law and L. Levin probability distributions , 2014 .

[42]  A A Tsonis,et al.  Is DNA a language? , 1997, Journal of theoretical biology.

[43]  Eduardo G. Altmann,et al.  Stochastic model for the vocabulary growth in natural languages , 2012, ArXiv.

[44]  Anastasios A. Tsonis,et al.  Linguistic Features in Eukaryotic Genomes , 2002, Complex..

[45]  Martin A. Nowak,et al.  The evolution of syntactic communication , 2000, Nature.

[46]  R. Mantegna,et al.  Zipf plots and the size distribution of firms , 1995 .