Contact-tracing in cultural evolution: a Bayesian mixture model to detect geographic areas of language contact

When speakers of two or more languages interact, they are likely to influence each other: contact leaves traces in the linguistic record, which in turn can reveal geographic areas of past human interaction and migration. However the complex, multi-dimensional nature of contact has hindered the development of a rigorous methodology for detecting its traces. Specifically, other factors may contribute to similarities between languages. Inheritance (a property is passed from an ancestor to several descendant languages), and universal preference (a property is universally preferred), may both overshadow contact signals. How can we find geographic contact areas in language data, while accounting for the confounding effects of inheritance and universal preference? We present sBayes, an algorithm for Bayesian clustering in the presence of confounding effects. The algorithm learns which similarities in a set of features are better accounted for by confounders, and which are due to contact effects. Contact areas are free to take any shape or size, but an explicit geographic prior ensures their spatial coherence. We test the clustering method on simulated data and apply it in two case studies to reveal language contact in South America and the Balkans. Our results are supported by —mostly qualitative— findings from previous studies. While we focus on the specific problem of language contact, the method can also be used to uncover other traces of shared history in cultural evolution, and more generally, to reveal latent spatial clusters in the presence of confounders.

[1]  Yugo Murawaki,et al.  Latent Geographical Factors for Analyzing the Evolution of Dialects in Contact , 2020, EMNLP.

[2]  B. Joseph Language Contact in the Balkans , 2020, The Handbook of Language Contact.

[3]  P. Muysken,et al.  Highland– lowland relations: A linguistic view , 2020 .

[4]  Chundra A. Cathcart,et al.  A probabilistic assessment of the Indo-Aryan Inner-Outer Hypothesis , 2019, ArXiv.

[5]  M. Urban Is there a Central Andean Linguistic Area? A View from the Perspective of the “Minor” Languages , 2019, Journal of Language Contact.

[6]  E. Gibson,et al.  How Efficiency Shapes Human Language , 2019, Trends in Cognitive Sciences.

[7]  Damián E. Blasi,et al.  Human sound systems are shaped by post-Neolithic changes in bite configuration , 2019, Science.

[8]  Simon J. Greenhill,et al.  Post-marital residence patterns show lineage-specific evolution , 2018, Evolution and Human Behavior.

[9]  M. Suchard,et al.  Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7 , 2018, Systematic biology.

[10]  R. V. Gijn,et al.  The Native Languages of South America: The Andean foothills and adjacent Amazonian fringe , 2014 .

[11]  John Baines,et al.  Quantitative historical analysis uncovers a single dimension of complexity that structures global variation in human social organization , 2017, Proceedings of the National Academy of Sciences.

[12]  S. Kirby Culture and biology in the origins of linguistic structure , 2017, Psychonomic bulletin & review.

[13]  Aki Vehtari,et al.  Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC , 2015, Statistics and Computing.

[14]  Simon J. Greenhill,et al.  D-PLACE: A Global Database of Cultural, Linguistic and Environmental Diversity , 2016, PloS one.

[15]  Joseph Bulbulia,et al.  Ritual human sacrifice promoted and sustained the evolution of stratified societies , 2016, Nature.

[16]  Marcelo Jolkesky Estudo arqueo-ecolinguístico das terras tropicais Sul-Americanas , 2016 .

[17]  T. Honkola,et al.  Applying Population Genetic Approaches within Languages: Finnish Dialects as Linguistic Populations , 2016 .

[18]  Pilar M. Valenzuela ¿Qué tan “amazónicas” son las lenguas kawapana? Contacto con las lenguas centro-andinas y elementos para un área lingüística intermedia , 2015, Lexis.

[19]  Seán G. Roberts,et al.  Climate, vocal folds, and tonal languages: Connecting the physiological and geographic dots , 2015, Proceedings of the National Academy of Sciences.

[20]  Lev Michael The areal linguistics of Amazonia , 2015 .

[21]  Balthasar Bickel,et al.  Distributional typology: statistical inquiries into the dynamics of linguistic diversity , 2015 .

[22]  D. Napoli,et al.  Order of the major constituents in sign languages: implications for all language , 2014, Front. Psychol..

[23]  Joshua Birchall,et al.  Argument marking patterns in South American languages , 2014 .

[24]  Lev Michael,et al.  A Relaxed Admixture Model of Language Contact , 2014 .

[25]  Lev Michael,et al.  Exploring Phonological Areality in the Circum-Andean Region Using a Naive Bayes Classifier , 2014 .

[26]  Pieter Muysken,et al.  Language contact outcomes as the result of bilingual optimization strategies , 2013 .

[27]  C. Everett Evidence for Direct Geographic Influences on Linguistic Sounds: The Case of Ejectives , 2013, PloS one.

[28]  Maryellen C. MacDonald,et al.  How language production shapes language form and comprehension , 2012, Front. Psychol..

[29]  Michael Meeuwis,et al.  Order of subject, object, and verb , 2013 .

[30]  Mark N Grote,et al.  Cultural Macroevolution on Neighbor Graphs , 2012, Human nature.

[31]  Willem Adelaar Languages of the Middle Andes in areal-typological perspective : Emphasis on Quechuan and Aymaran , 2012 .

[32]  Olga Krasnoukhova The noun phrase in the languages of South America , 2012 .

[33]  C. Nunn The Comparative Approach in Evolutionary Anthropology and Biology , 2011 .

[34]  Katarzyna Bryc,et al.  On Identifying the Optimal Number of Population Clusters via the Deviance Information Criterion , 2011, PloS one.

[35]  Ming-Hui Chen,et al.  Improving marginal likelihood estimation for Bayesian phylogenetic model selection. , 2011, Systematic biology.

[36]  Y. Matras Explaining convergence and the formation of linguistic areas. , 2011 .

[37]  L. Eriksen Nature and Culture in Prehistoric Amazonia Using G.I.S. to reconstruct ancient ethnogenetic processes from archaeology, linguistics, geography, and ethnohistory , 2011 .

[38]  Stacy Mason,et al.  All for nothing? , 2011, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[39]  Simon J. Greenhill,et al.  On the shape and fabric of human history , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[40]  WatanabeSumio Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory , 2010 .

[41]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[42]  Sumio Watanabe,et al.  Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory , 2010, J. Mach. Learn. Res..

[43]  M. Dunn,et al.  Explaining the Linguistic Diversity of Sahul Using Population Models , 2009, PLoS biology.

[44]  Hal Daumé,et al.  Non-Parametric Bayesian Areal Linguistics , 2009, HLT-NAACL.

[45]  Robert Forkel,et al.  The World Atlas of Language Structures Online , 2009 .

[46]  Johanna Nichols,et al.  Diversity and Stability in Language , 2008 .

[47]  W. Bruce Croft Evolutionary Linguistics , 2008 .

[48]  E. Wilson,et al.  Cultural Evolution : Accomplishments and Future Prospects , 2008 .

[49]  Simon J. Greenhill,et al.  The Pleasures and Perils of Darwinizing Culture (with Phylogenies) , 2007 .

[50]  Laurie Bauer,et al.  Phoneme inventory size and population size , 2007 .

[51]  D. Ladd,et al.  Linguistic tone is related to the population frequency of the adaptive haplogroups of two brain size genes, ASPM and Microcephalin , 2007, Proceedings of the National Academy of Sciences.

[52]  E. Crevels,et al.  The Guaporé-Mamoré region as a linguistic area , 2008 .

[53]  Zélia Bonamigo BAPTISTA, Selma. 2006. Una Concepción Trágica de la Cultura. Lima: Fondo Editorial de la Pontificia Universidad Católica del Perú. , 2006 .

[54]  J. Nichols,et al.  Oceania, the Pacific Rim, and the Theory of Linguistic Areas , 2006 .

[55]  Masuhiro Kogoma 総論;総論;Introduction , 2006 .

[56]  Lyle Campbell Areal Linguistics: A Closer Scrutiny , 2006 .

[57]  Victor A. Friedman,et al.  Balkans as a Linguistic Area , 2006 .

[58]  Bernd Heine,et al.  The Changing Languages of Europe , 2006 .

[59]  P. Muysken,et al.  The Languages of the Andes: List of maps , 2004 .

[60]  R. Rooij,et al.  On Polar Questions , 2003 .

[61]  M. J. O’Brien,et al.  Cultural Traits: Units of Analysis in Early Twentieth-Century Anthropology , 2003, Journal of Anthropological Research.

[62]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[63]  F. Boas Handbook of American Indian languages , 2002 .

[64]  R. Singh,et al.  The Definition and Significance of Linguistic Areas: Methods, Pitfalls, and Possibilities (with Special Reference to the Validity of South Asia as a Linguistic Area) , 2001 .

[65]  Martin Haspelmath,et al.  Principles of areal typology , 2001 .

[66]  Ronelle Alexander,et al.  Tracking Sprachbund Boundaries: Word Order in the Balkans , 2000, Languages in Contact.

[67]  Jouko Lindstedt Linguistic Balkanization: Contact-Induced Change By Mutual Reinforcement , 2000, Languages in Contact.

[68]  Timothy Jowan Curnow,et al.  Why Paez Is Not a Barbacoan Language: The Nonexistence of "Moguex" and the Use of Early Sources , 1998, International Journal of American Linguistics.

[69]  W. Bruce Croft Typology and Universals , 1990 .

[70]  Thomas Th Büttner Las lenguas de los Andes centrales : estudios sobre la clasificación genética, areal y tipológica , 1983 .

[71]  D. Matula,et al.  Properties of Gabriel Graphs Relevant to Geographic Variation Research and the Clustering of Points in the Plane , 2010 .

[72]  Joseph Felsenstein,et al.  Maximum Likelihood and Minimum-Steps Methods for Estimating Evolutionary Trees from Data on Discrete Characters , 1973 .

[73]  P. Ivić Balkan Slavic Migrations in the Light of South Slavic Dialectology , 1972 .

[74]  K. Sandfeld Balkanfilologien : en oversigt over dens resultater og problemer , 1926 .