A predictability-distinctiveness trade-off in the historical emergence of word forms

It has been proposed that language evolves under the joint constraints of communicative expressivity and cognitive ease. We explore this idea in the historical emergence of word forms. We hypothesize that new word forms that enter the lexicon should reflect a trade-off between predictability and distinctiveness. An emergent word form can be highly predictable if it efficiently reuses elements from the existing word forms, resulting in low cognitive load. An emergent word form should also be sufficiently distinctive from the existing lexicon, facilitating communicative expressivity. We test our hypothesis by examining the properties of 34,478 emergent word forms over the past 200 years of Modern English. We show how word forms at future time t + 1 are bounded statistically between n-gram generated word forms (highly predictable) and slang words that are outside the standard lexicon (highly distinctive) at time t. Our work supports the view of cognitive economy in lexical emergence.

[1]  W. Labov Principles of Linguistic Change: Cognitive and Cultural Factors , 2010 .

[2]  Todd M. Bailey,et al.  Determinants of wordlikeness: Phonotactics or lexical neighborhoods? , 2001 .

[3]  Joseph Paul Stemberger,et al.  Neighbourhood effects on error rates in speech production , 2004, Brain and Language.

[4]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[5]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[6]  D. Balota,et al.  Moving beyond Coltheart’s N: A new measure of orthographic similarity , 2008, Psychonomic bulletin & review.

[7]  Elly van Gelderen,et al.  A History of the English Language , 2000 .

[8]  Edward Flemming Contrast and Perceptual Distinctiveness , 2003 .

[9]  G. Zipf,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. , 1949 .

[10]  Mahesh Srinivasan,et al.  Algorithms in the historical emergence of word senses , 2018, Proceedings of the National Academy of Sciences.

[11]  M. Beckman,et al.  The interaction between vocabulary size and phonotactic probability effects on children's production accuracy and fluency in nonword repetition. , 2004, Journal of speech, language, and hearing research : JSLHR.

[12]  R. G. Kent,et al.  Language: Its Nature, Development, and Origin , 1923 .

[13]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[14]  Yang Xu,et al.  Semantic Typology and Efficient Communication , 2018 .

[15]  Nathan C. Sanders,et al.  Phonological Distance Measures* , 2009, J. Quant. Linguistics.

[16]  Charles Kemp,et al.  Kinship Categories Across Languages Reflect General Communicative Principles , 2012, Science.

[17]  Elisa Mattiello,et al.  An Introduction to English Slang: A Description of its Morphology, Semantics and Sociology , 2008 .

[18]  Steven T. Piantadosi,et al.  The communicative function of ambiguity in language , 2011, Cognition.

[19]  Thomas L. Griffiths,et al.  Word forms - not just their lengths- are optimized for efficient communication , 2017, ArXiv.

[20]  Elisa Mattiello,et al.  Extra-grammatical Morphology in English: Abbreviations, Blends, Reduplicatives, and Related Phenomena , 2013 .

[21]  S. Kirby,et al.  Compression and communication in the cultural evolution of linguistic structure , 2015, Cognition.

[22]  Andrea Krott,et al.  Some Remarks on the Relation between Word Length and Morpheme Length , 1996, J. Quant. Linguistics.

[23]  Eleanor Rosch,et al.  Principles of Categorization , 1978 .