Zipf's law of abbreviation as a language universal

Words that are used more frequently tend to be shorter. This statement is known as Zipf’s law of abbreviation. Here we perform the widest investigation of the presence of the law to date. In a sample of 1262 texts and 986 different languages - about 13% of the world’s language diversity - a negative correlation between word frequency and word length is found in all cases. In line with Zipf’s original proposal, we argue that this universal trend is likely to derive from fundamental principles of information processing and transfer.

[1]  Ramon Ferrer-i-Cancho,et al.  Random Texts Do Not Exhibit the Real Zipf's Law-Like Rank Distribution , 2010, PloS one.

[2]  Simon Kirby,et al.  Function, Selection, and Innateness: The Emergence of Language Universals , 1999 .

[3]  B. Bickel Typology in the 21st century: Major current developments , 2007 .

[4]  Kara D. Federmeier,et al.  Timed picture naming in seven languages , 2003, Psychonomic bulletin & review.

[5]  Jiang Feng,et al.  Brevity is prevalent in bat short-range communication , 2013, Journal of Comparative Physiology A.

[6]  Edward Gibson,et al.  Quantitative Standards for Absolute Linguistic Universals , 2014, Cogn. Sci..

[7]  Thomas Mayer,et al.  Creating a massively parallel Bible corpus , 2014, LREC.

[8]  George Kingsley Zipf,et al.  Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology , 2012 .

[9]  M. F. Fuller,et al.  Practical Nonparametric Statistics; Nonparametric Statistical Inference , 1973 .

[10]  Steven T Piantadosi,et al.  Word lengths are optimized for efficient communication , 2011, Proceedings of the National Academy of Sciences.

[11]  Ramon Ferrer-i-Cancho,et al.  The Failure of the Law of Brevity in Two New World Primates. Statistical Caveats. , 2012, ArXiv.

[12]  Ramon Ferrer-i-Cancho,et al.  The challenges of statistical patterns of language: The case of Menzerath's law in genomes , 2012, Complex..

[13]  B. D. Boer Evolutionary phonology: the emergence of sound patterns , 2006 .

[14]  David Lusseau,et al.  Efficient coding in dolphin surface behavioral patterns , 2009, Complex..

[15]  G. Zipf,et al.  The Psycho-Biology of Language , 1936 .

[16]  Balthasar Bickel,et al.  Absolute and statistical universals , 2011 .

[17]  Govindasamy Agoramoorthy,et al.  Efficiency of coding in macaque vocal communication , 2010, Biology Letters.

[18]  Ramon Ferrer-i-Cancho,et al.  Compression and the origins of Zipf's law of abbreviation , 2015, ArXiv.

[19]  J. Hailman,et al.  The ‘chick-a-dee’ calls of Parus atricapillus: A recombinant system of animal communication compared with written English , 1985 .

[20]  Morten H. Christiansen,et al.  Language as shaped by the brain. , 2008, The Behavioral and brain sciences.

[21]  George Kingsley Zipf,et al.  The Psychobiology of Language , 2022 .

[22]  G. Zipf Selected Studies of the Principle of Relative Frequency in Language , 2014 .

[23]  David Lusseau,et al.  Compression as a Universal Principle of Animal Behavior , 2013, Cogn. Sci..

[24]  Gabriel Altmann,et al.  Towards a Theory of Word Length Distribution , 1994, J. Quant. Linguistics.

[25]  Gabriel Altmann,et al.  Word Length and Word Frequency , 2007 .

[26]  J. Sidnell,et al.  Linguistic diversity and universals , 2014 .

[27]  S. Piantadosi,et al.  Info/information theory: Speakers choose shorter words in predictive contexts , 2013, Cognition.

[28]  G. Miller,et al.  Some effects of intermittent silence. , 1957, The American journal of psychology.

[29]  Wentian Li,et al.  Random texts exhibit Zipf's-law-like word frequency distribution , 1992, IEEE Trans. Inf. Theory.

[30]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[31]  Michael Mitzenmacher,et al.  Power laws for monkeys typing randomly: the case of unequal probabilities , 2004, IEEE Transactions on Information Theory.