Challenges in detecting evolutionary forces in language change using diachronic corpora

Newberry et al. (Detecting evolutionary forces in language change, Nature 551, 2017) tackle an important but difficult problem in linguistics, the testing of selective theories of language change against a null model of drift. Having applied a test from population genetics (the Frequency Increment Test) to a number of relevant examples, they suggest stochasticity has a previously under-appreciated role in language evolution. We replicate their results and find that while the overall observation holds, results produced by this approach on individual time series can be sensitive to how the corpus is organized into temporal segments (binning). Furthermore, we use a large set of simulations in conjunction with binning to systematically explore the range of applicability of the Frequency Increment Test. We conclude that care should be exercised with interpreting results of tests like the Frequency Increment Test on individual series, given the researcher degrees of freedom available when applying the test to corpus data, and fundamental differences between genetic and linguistic data. Our findings have implications for selection testing and temporal binning in general, as well as demonstrating the usefulness of simulations for evaluating methods newly introduced to the field.

[1]  S. Kirby,et al.  Linguistic structure is an evolutionary trade-off between simplicity and expressivity , 2013, CogSci.

[2]  Jure Leskovec,et al.  Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change , 2016, EMNLP.

[3]  Thomas L Griffiths,et al.  Words as alleles: connecting language evolution with Bayesian learners to models of genetic drift , 2010, Proceedings of the Royal Society B: Biological Sciences.

[4]  Terttu Nevalainen,et al.  CEECing the baseline: lexical stability and significant change in a historical corpus , 2012 .

[5]  Mikael Parkvall,et al.  Modeling the Evolution of Creoles , 2015 .

[6]  Eduardo G. Altmann,et al.  Extracting information from S-curves of language change , 2014, Journal of The Royal Society Interface.

[7]  J. M. Hernández-Campoy,et al.  The Handbook of Historical Sociolinguistics , 2012 .

[8]  Simon Garrod,et al.  The Interactive Evolution of Human Communication Systems , 2010, Cogn. Sci..

[9]  Lorna M. Lopez,et al.  Modulation of Genetic Associations with Serum Urate Levels by Body-Mass-Index in Humans , 2015, PloS one.

[10]  Rick Dale,et al.  Culturomics as a data playground for tests of selection: Mathematical approaches to detecting selection in word use. , 2016, Journal of theoretical biology.

[11]  R. G. Kent,et al.  Language: Its Nature, Development, and Origin , 1923 .

[12]  Edward Sapir,et al.  Language: An Introduction to the Study of Speech , 1955 .

[13]  Mark Pagel,et al.  Modelling loanword success – a sociolinguistic quantitative study of Māori loanwords in New Zealand English , 2020, Corpus Linguistics and Linguistic Theory.

[14]  Iveta Simera,et al.  Ten simple rules for measuring the impact of workshops , 2018, PLoS computational biology.

[15]  David Lightfoot,et al.  Explaining language change: an evolutionary approach , 2002 .

[16]  F. Coulmas,et al.  社会语言学通览 = The Handbook of sociolinguistics , 2001 .

[17]  Adilson E. Motter,et al.  Niche as a Determinant of Word Fate in Online Groups , 2010, PloS one.

[18]  Lucas Lacasa,et al.  The dynamics of norm change in the cultural evolution of language , 2018, Proceedings of the National Academy of Sciences.

[19]  R. Alexander Bentley,et al.  Random Drift versus Selection in Academic Vocabulary: An Evolutionary Analysis of Published Keywords , 2008, PloS one.

[20]  J. Ohala The Origin of Sound Patterns in Vocal Tract Constraints , 1983 .

[21]  B. Joseph,et al.  Historical Linguistics , 1999 .

[22]  W. Ewens Mathematical Population Genetics : I. Theoretical Introduction , 2004 .

[23]  William Croft,et al.  Explaining language change : an evolutionary approach , 2000 .

[24]  Seán G. Roberts,et al.  Cognitive influences in language evolution: Psycholinguistic predictors of loan word borrowing , 2019, Cognition.

[25]  Freek Van de Velde,et al.  Degeneracy: the maintenance of constructional networks , 2012 .

[26]  Lieselotte Anderwald Variable Past-Tense Forms in Nineteenth-Century American English: Linking Normative Grammars and Language Change , 2012 .

[27]  Stavroula Kousta,et al.  Understanding language change , 2017, Nature Human Behaviour.

[28]  Nick C Fox,et al.  Gene-Wide Analysis Detects Two New Susceptibility Genes for Alzheimer's Disease , 2014, PLoS ONE.

[29]  Robert Kofler,et al.  MimicrEE2: Genome-wide forward simulations of Evolve and Resequencing studies , 2018, PLoS Comput. Biol..

[30]  Simon Kirby,et al.  Quantifying the dynamics of topical fluctuations in language , 2018, Language Dynamics and Change.

[31]  Robin Clark,et al.  Detecting evolutionary forces in language change , 2016, Nature.

[32]  Morten H. Christiansen,et al.  Arbitrariness, Iconicity, and Systematicity in Language , 2015, Trends in Cognitive Sciences.

[33]  Markus Diesmann,et al.  A multi-scale layer-resolved spiking network model of resting-state dynamics in macaque visual cortical areas , 2018, PLoS Comput. Biol..

[34]  W. Labov Principles of Linguistic Change; Volume 3; Cognitive and cultural factors , 2010 .

[35]  Anna-Sapfo Malaspinas,et al.  Methods to characterize selective sweeps using time serial samples: an ancient DNA perspective , 2016, Molecular ecology.

[36]  P. Christiaan Klink,et al.  General Validity of Levelt's Propositions Reveals Common Computational Mechanisms for Visual Rivalry , 2008, PloS one.

[37]  Simon Kirby,et al.  Zipf’s Law of Abbreviation and the Principle of Least Effort: Language users optimise a miniature lexicon for efficient communication , 2017, Cognition.

[38]  Benedikt Szmrecsanyi,et al.  About text frequencies in historical linguistics: Disentangling environmental and grammatical change , 2015 .

[39]  S. Wright,et al.  Evolution in Mendelian Populations. , 1931, Genetics.

[40]  J. Plotkin,et al.  Identifying Signatures of Selection in Genetic Time Series , 2013, Genetics.

[41]  Jo Nishino Detecting Selection Using Time-Series Data of Allele Frequencies with Multiple Independent Reference Loci , 2013, G3: Genes, Genomes, Genetics.

[42]  Stephen Shennan,et al.  Cultural Transmission and Stochastic Network Growth , 2003, American Antiquity.

[43]  Terttu Nevalainen,et al.  Outposts of Historical Corpus Linguistics: From the Helsinki Corpus to a Proliferation of Resources , 2012 .

[44]  Enrico R. Crema,et al.  Analysing cultural frequency data: Neutral theory and beyond , 2019 .

[45]  D. Sornette,et al.  The US Stock Market Leads the Federal Funds Rate and Treasury Bond Yields , 2011, PloS one.

[46]  Joshua B. Plotkin,et al.  Evolutionary forces in language change , 2016, ArXiv.

[47]  Nick Chater,et al.  Simpler grammar, larger vocabulary: How population size affects language , 2018, Proceedings of the Royal Society B: Biological Sciences.

[48]  Richard A. Blythe,et al.  Neutral Evolution: a Null Model for Language Dynamics , 2011, Adv. Complex Syst..

[49]  N. J. Enfield Transmission biases in the cultural evolution of language: Towards an explanatory framework , 2014 .

[50]  Claudio Castellano,et al.  Internal and External Dynamics in Language: Evidence from Verb Regularity in a Historical Corpus of English , 2014, PloS one.

[51]  Christopher M. Danforth,et al.  English verb regularization in books and tweets , 2018, PloS one.

[52]  Laura Fortunato,et al.  Inferring individual-level processes from population-level patterns in cultural evolution , 2017, bioRxiv.

[53]  Demise Daoust Language Planning and Language Reform , 2017 .

[54]  ScienceOpen Admin Glossa: a journal of general linguistics , 2018 .

[55]  J. Grieve,et al.  Mapping Lexical Innovation on American Social Media , 2018, Journal of English Linguistics.

[56]  Joshua G Schraiber,et al.  Bayesian Inference of Natural Selection from Allele Frequency Time Series , 2016, Genetics.

[57]  Joan Rubin,et al.  Contributions to the sociology of language , 1977 .

[58]  Eörs Szathmáry,et al.  The evolutionary dynamics of language , 2017, Biosyst..

[59]  Alexander Koplenig,et al.  The impact of lacking metadata for the measurement of cultural and linguistic change using the Google Ngram data sets - Reconstructing the composition of the German corpus in times of WWII , 2015, Digit. Scholarsh. Humanit..

[60]  C. Habel,et al.  Language , 1931, NeuroImage.

[61]  Simon Kirby,et al.  Momentum in Language Change: A Model of Self-Actuating S-shaped Curves , 2016 .

[62]  Leif D. Nelson,et al.  False-Positive Psychology , 2011, Psychological science.

[63]  Henri Kauhanen,et al.  Neutral change 1 , 2016, Journal of Linguistics.

[64]  J. Bouchaud,et al.  Why Do Markets Crash? Bitcoin Data Offers Unprecedented Insights , 2015, PloS one.

[65]  Yuen Ren Chao,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[66]  Enrico R. Crema,et al.  Revealing patterns of cultural transmission from frequency data: equilibrium and non-equilibrium assumptions , 2016, Scientific Reports.

[67]  Robert Kofler,et al.  Benchmarking software tools for detecting and quantifying selection in Evolve and Resequencing studies , 2019 .

[68]  Simon Kirby,et al.  Speaker Input Variability Does Not Explain Why Larger Populations Have Simpler Languages , 2015, PloS one.

[69]  M. Feldman,et al.  Cultural niche construction and human evolution , 2001, Journal of evolutionary biology.

[70]  Christian Schlötterer,et al.  Multi-locus Analysis of Genomic Time Series Data from Experimental Evolution , 2014, bioRxiv.

[71]  Casci Tanita Interactive evolution , 2018, Nature Reviews Genetics.

[72]  T. M. Ellison,et al.  Cultural selection drives the evolution of human communication systems , 2014, Proceedings of the Royal Society B: Biological Sciences.

[73]  Erez Lieberman,et al.  Quantifying the evolutionary dynamics of language , 2007, Nature.

[74]  Christopher M. Danforth,et al.  Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution , 2015, PloS one.

[75]  Henri Kauhanen,et al.  Neutral change , 2016 .

[76]  Simon Hengchen,et al.  Time-Out: Temporal Referencing for Robust Modeling of Lexical Semantic Change , 2019, ACL.

[77]  Richard A. William Blythe,et al.  S-curves and the mechanisms of propagation in language change , 2012 .

[78]  Simon Kirby,et al.  Cumulative cultural evolution in the laboratory: An experimental approach to the origins of structure in human language , 2008, Proceedings of the National Academy of Sciences.

[79]  Changshui Zhang,et al.  Learning and Innovative Elements of Strategy Adoption Rules Expand Cooperative Network Topologies , 2007, PloS one.

[80]  James A. Walker,et al.  Variation in Linguistic Systems , 2010 .

[81]  Vineet Bafna,et al.  Clear: Composition of Likelihoods for Evolve and Resequence Experiments , 2016, Genetics.

[82]  Matthew W. Hahn,et al.  Drift as a mechanism for cultural change: an example from baby names , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[83]  S. Gries Useful statistics for corpus linguistics , 2009 .

[84]  Henning Andersen The structure of drift , 1990 .

[85]  Lukas S. Premo,et al.  Cultural Transmission and Diversity in Time-Averaged Assemblages , 2014, Current Anthropology.

[86]  A. Futschik,et al.  Quantifying Selection with Pool-Seq Time Series Data , 2017, Molecular biology and evolution.

[87]  Matthew Rowe,et al.  Towards Modelling Language Innovation Acceptance in Online Social Networks , 2016, WSDM.

[88]  M. Haspelmath,et al.  Optimality and diachronic adaptation , 1999 .

[89]  W. Labov Principles Of Linguistic Change , 1994 .

[90]  W. Bruce Croft,et al.  Selection Model of , 2022 .