Quantifying the dynamics of topical fluctuations in language

The availability of large diachronic corpora has provided the impetus for a growing body of quantitative research on language evolution and meaning change. The central quantities in this research are token frequencies of linguistic elements in texts, with changes in frequency taken to reflect the popularity or selective fitness of an element. However, corpus frequencies may change for a wide variety of reasons, including purely random sampling effects, or because corpora are composed of contemporary media and fiction texts within which the underlying topics ebb and flow with cultural and socio-political trends. In this work, we introduce a simple model for controlling for topical fluctuations in corpora - the topical-cultural advection model - and demonstrate how it provides a robust baseline of variability in word frequency changes over time. We validate the model on a diachronic corpus spanning two centuries, and a carefully-controlled artificial language change scenario, and then use it to correct for topical fluctuations in historical time series. Finally, we use the model to show that the emergence of new words typically corresponds with the rise of a trending topic. This suggests that some lexical innovations occur due to growing communicative need in a subspace of the lexicon, and that the topical-cultural advection model can be used to quantify this.

[1]  Eyal Sagi,et al.  Tracing semantic change with latent semantic analysis , 2011 .

[2]  Yuen Ren Chao,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[3]  Henri Kauhanen,et al.  Neutral change , 2016 .

[4]  T. M. Ellison,et al.  Cultural selection drives the evolution of human communication systems , 2014, Proceedings of the Royal Society B: Biological Sciences.

[5]  Erez Lieberman,et al.  Quantifying the evolutionary dynamics of language , 2007, Nature.

[6]  Richard A. William Blythe,et al.  S-curves and the mechanisms of propagation in language change , 2012 .

[7]  Christopher M. Danforth,et al.  Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution , 2015, PloS one.

[8]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[9]  Jean-Pierre Nadal,et al.  Frequency patterns of semantic change: corpus-based evidence of a near-critical dynamics in language change , 2017, Royal Society Open Science.

[10]  Edward Gibson,et al.  Wordform Similarity Increases With Semantic Similarity: An Analysis of 100 Languages , 2016, Cogn. Sci..

[11]  木村 資生,et al.  Population genetics, molecular evolution, and the neutral theory : selected papers , 1994 .

[12]  Yang Xu,et al.  A Computational Evaluation of Two Laws of Semantic Change , 2015, CogSci.

[13]  Enrico R. Crema,et al.  Revealing patterns of cultural transmission from frequency data: equilibrium and non-equilibrium assumptions , 2016, Scientific Reports.

[14]  Simon Kirby,et al.  Cumulative cultural evolution in the laboratory: An experimental approach to the origins of structure in human language , 2008, Proceedings of the National Academy of Sciences.

[15]  Henning Andersen The structure of drift , 1990 .

[16]  Robin Clark,et al.  Detecting evolutionary forces in language change , 2016, Nature.

[17]  Slav Petrov,et al.  Temporal Analysis of Language through Neural Language Models , 2014, LTCSS@ACL.

[18]  Søren Wichmann,et al.  The Emerging Field of Language Dynamics , 2008, Lang. Linguistics Compass.

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  Harry Eugene Stanley,et al.  Statistical Laws Governing Fluctuations in Word Use from Word Birth to Word Death , 2011, Scientific Reports.

[21]  R. G. Kent,et al.  Language: Its Nature, Development, and Origin , 1923 .

[22]  Vladimir V. Bochkarev,et al.  Average word length dynamics as indicator of cultural changes in society , 2012, ArXiv.

[23]  Jörg Schultz,et al.  Word Formation Is Aware of Morpheme Family Size , 2014, PloS one.

[24]  Bevil R. Conway,et al.  Color naming across languages reflects color use , 2017, Proceedings of the National Academy of Sciences.

[25]  Paula Chesley,et al.  Predicting new words from newer words: Lexical borrowings in French , 2010 .

[26]  Forrest Stonedahl,et al.  A model of grassroots changes in linguistic systems , 2014, ArXiv.

[27]  Maxi San Miguel,et al.  Agent-based models of language competition , 2013 .

[28]  Steven Skiena,et al.  Statistically Significant Detection of Linguistic Change , 2014, WWW.

[29]  Carolin Müller-Spitzer,et al.  Population Size Predicts Lexical Diversity, but so Does the Mean Sea Level – Why It Is Important to Correctly Account for the Structure of Temporal Data , 2015, PloS one.

[30]  Katrin Erk,et al.  Deep Neural Models of Semantic Shift , 2018, NAACL-HLT.

[31]  Claudio Castellano,et al.  Internal and External Dynamics in Language: Evidence from Verb Regularity in a Historical Corpus of English , 2014, PloS one.

[32]  Björn-Olav Dozo,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010 .

[33]  M. Pagel,et al.  Frequency of word-use predicts rates of lexical evolution throughout Indo-European history , 2007, Nature.

[34]  Dominik Schlechtweg,et al.  German in Flux: Detecting Metaphoric Change via Word Entropy , 2017, CoNLL.

[35]  D. Wijaya,et al.  Understanding semantic change of words over centuries , 2011, DETECT '11.

[36]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[37]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[38]  Mirella Lapata,et al.  A Bayesian Model of Diachronic Meaning Change , 2016, TACL.

[39]  Rick Dale,et al.  Culturomics as a data playground for tests of selection: Mathematical approaches to detecting selection in word use. , 2016, Journal of theoretical biology.

[40]  Alessandro Lenci,et al.  Panta Rei: Tracking Semantic Change with Distributional Semantics in Ancient Greek , 2016, CLiC-it/EVALITA.

[41]  S. Strogatz,et al.  Linguistics: Modelling the dynamics of language death , 2003, Nature.

[42]  Kenny Smith,et al.  Acquiring variation in an artificial language: Children and adults are sensitive to socially conditioned linguistic variation , 2017, Cognitive Psychology.

[43]  Richard A. Blythe,et al.  Neutral Evolution: a Null Model for Language Dynamics , 2011, Adv. Complex Syst..

[44]  Stephen D. Casler Why Growth Rates? Which Growth Rate? Specification and Measurement Issues in Estimating Elasticity Values , 2015 .

[45]  Stavroula Kousta,et al.  Understanding language change , 2017, Nature Human Behaviour.

[46]  Laura Fortunato,et al.  Inferring individual-level processes from population-level patterns in cultural evolution , 2017, bioRxiv.

[47]  V Bochkarev,et al.  Universals versus historical contingencies in lexical evolution , 2014, Journal of The Royal Society Interface.

[48]  Terttu Nevalainen,et al.  CEECing the baseline: lexical stability and significant change in a historical corpus , 2012 .

[49]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[50]  Margaret E. Roberts,et al.  The structural topic model and applied social science , 2013, ICONIP 2013.

[51]  Daphna Weinshall,et al.  Verbs change more than nouns: a bottom-up computational approach to semantic change , 2016 .

[52]  J. Plotkin,et al.  Identifying Signatures of Selection in Genetic Time Series , 2013, Genetics.

[53]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[54]  Demise Daoust Language Planning and Language Reform , 2017 .

[55]  Yulia Tsvetkov,et al.  A bottom up approach to category mapping and meaning change , 2015, NetWordS.

[56]  Benedikt Szmrecsanyi,et al.  Variationist sociolinguistics and corpus-based variationist linguistics: overlap and cross-pollination potential , 2017, Canadian Journal of Linguistics/Revue canadienne de linguistique.

[57]  Florent Perek,et al.  Using distributional semantics to study syntactic productivity in diachrony: A case study , 2016 .

[58]  N. J. Enfield Transmission biases in the cultural evolution of language: Towards an explanatory framework , 2014 .

[59]  V Bochkarev Vladimir,et al.  The average word length dynamics as an indicator of cultural changes in society , 2015 .

[60]  Matthew W. Hahn,et al.  Drift as a mechanism for cultural change: an example from baby names , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[61]  Anne Christophe,et al.  Words cluster phonetically beyond phonotactic regularities , 2017, Cognition.

[62]  Karl Aquino,et al.  A decline in prosocial language helps explain public disapproval of the US Congress , 2015, Proceedings of the National Academy of Sciences.

[63]  W. Bruce Croft,et al.  Modeling language change: An evaluation of Trudgill's theory of the emergence of New Zealand English , 2009, Language Variation and Change.

[64]  Simon Kirby,et al.  Momentum in Language Change: A Model of Self-Actuating S-shaped Curves , 2016 .

[65]  Henri Kauhanen,et al.  Neutral change 1 , 2016, Journal of Linguistics.

[66]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[67]  Paul Ormerod,et al.  Books Average Previous Decade of Economic Misery , 2014, PloS one.

[68]  Benedikt Szmrecsanyi,et al.  Late Modern English Syntax: Culturally conditioned language change? A multivariate analysis of genitive constructions in ARCHER , 2014 .

[69]  Adilson E. Motter,et al.  Niche as a Determinant of Word Fate in Online Groups , 2010, PloS one.

[70]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[71]  Charles Kemp,et al.  Languages Support Efficient Communication about the Environment: Words for Snow Revisited , 2016, PloS one.

[72]  Lucas Lacasa,et al.  The dynamics of norm change in the cultural evolution of language , 2018, Proceedings of the National Academy of Sciences.

[73]  Simon Kirby,et al.  Zipf’s Law of Abbreviation and the Principle of Least Effort: Language users optimise a miniature lexicon for efficient communication , 2017, Cognition.

[74]  Benedikt Szmrecsanyi,et al.  About text frequencies in historical linguistics: Disentangling environmental and grammatical change , 2015 .

[75]  M. Kimura Population Genetics, Molecular Evolution, and the Neutral Theory: Selected Papers , 1995 .

[76]  William Croft,et al.  Explaining language change : an evolutionary approach , 2000 .

[77]  Marco Baroni,et al.  A distributional similarity approach to the detection of semantic change in the Google Books Ngram corpus. , 2011, GEMS.

[78]  W. Labov Principles Of Linguistic Change , 1994 .

[79]  Chu-Ren Huang,et al.  Testing APSyn against Vector Cosine on Similarity Estimation , 2016, PACLIC.

[80]  Alexander Koplenig,et al.  The impact of lacking metadata for the measurement of cultural and linguistic change using the Google Ngram data sets - Reconstructing the composition of the German corpus in times of WWII , 2015, Digit. Scholarsh. Humanit..

[81]  Simon Kirby,et al.  The Cultural Evolution of Structured Languages in an Open‐Ended, Continuous World , 2016, Cogn. Sci..

[82]  Anne Kandler,et al.  Generative inference for cultural evolution , 2018, Philosophical Transactions of the Royal Society B: Biological Sciences.

[83]  Lars Hinrichs,et al.  Which-hunting and the Standard English relative clause , 2014 .

[84]  Kevin Duh,et al.  A framework for analyzing semantic change of words across time , 2014, IEEE/ACM Joint Conference on Digital Libraries.

[85]  Charles Wetherell,et al.  The Log Percent (L%): An Absolute Measure of Relative Change , 1986 .

[86]  Joan Rubin,et al.  Language planning processes , 1977 .

[87]  Simon Kirby,et al.  Simplicity and Specificity in Language: Domain-General Biases Have Domain-Specific Effects , 2016, Front. Psychol..

[88]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[89]  J. Ohala The Origin of Sound Patterns in Vocal Tract Constraints , 1983 .

[90]  W. Ewens Mathematical Population Genetics : I. Theoretical Introduction , 2004 .

[91]  Eduardo G. Altmann,et al.  Extracting information from S-curves of language change , 2014, Journal of The Royal Society Interface.

[92]  J. M. Hernández-Campoy,et al.  The Handbook of Historical Sociolinguistics , 2012 .

[93]  Jure Leskovec,et al.  Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change , 2016, ACL.

[94]  S. Peperkamp,et al.  An experimental study of the role of social factors in language change: The case of loanword adaptations , 2014 .

[95]  Jörg Schultz,et al.  Connectivity, Not Frequency, Determines the Fate of a Morpheme , 2013, PloS one.

[96]  L. Törnqvist,et al.  How Should Relative Changes be Measured , 1985 .

[97]  Daphna Weinshall,et al.  Outta Control: Laws of Semantic Change and Inherent Biases in Word Representation Models , 2017, EMNLP.

[98]  Alexander Koplenig,et al.  Why the quantitative analysis of diachronic corpora that does not consider the temporal aspect of time-series can lead to wrong conclusions , 2015, Digit. Scholarsh. Humanit..

[99]  Edward Sapir,et al.  Language: An Introduction to the Study of Speech , 1955 .

[100]  Mark Pagel,et al.  Modelling loanword success – a sociolinguistic quantitative study of Māori loanwords in New Zealand English , 2020, Corpus Linguistics and Linguistic Theory.

[101]  Simon Kirby,et al.  Topical advection as a baseline model for corpus-based lexical dynamics , 2018 .

[102]  S. Kirby,et al.  Linguistic structure is an evolutionary trade-off between simplicity and expressivity , 2013, CogSci.

[103]  Jure Leskovec,et al.  Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change , 2016, EMNLP.

[104]  Thomas L Griffiths,et al.  Words as alleles: connecting language evolution with Bayesian learners to models of genetic drift , 2010, Proceedings of the Royal Society B: Biological Sciences.

[105]  Paul Caruana-Galizia,et al.  Politics and the German language: Testing Orwell's hypothesis using the Google N-Gram corpus , 2016, Digit. Scholarsh. Humanit..

[106]  R. Alexander Bentley,et al.  Random Drift versus Selection in Academic Vocabulary: An Evolutionary Analysis of Published Keywords , 2008, PloS one.