Across language families: Genome diversity mirrors linguistic variation within Europe

ABSTRACT Objectives: The notion that patterns of linguistic and biological variation may cast light on each other and on population histories dates back to Darwin's times; yet, turning this intuition into a proper research program has met with serious methodological difficulties, especially affecting language comparisons. This article takes advantage of two new tools of comparative linguistics: a refined list of Indo‐European cognate words, and a novel method of language comparison estimating linguistic diversity from a universal inventory of grammatical polymorphisms, and hence enabling comparison even across different families. We corroborated the method and used it to compare patterns of linguistic and genomic variation in Europe. Materials and Methods: Two sets of linguistic distances, lexical and syntactic, were inferred from these data and compared with measures of geographic and genomic distance through a series of matrix correlation tests. Linguistic and genomic trees were also estimated and compared. A method (Treemix) was used to infer migration episodes after the main population splits. Results: We observed significant correlations between genomic and linguistic diversity, the latter inferred from data on both Indo‐European and non‐Indo‐European languages. Contrary to previous observations, on the European scale, language proved a better predictor of genomic differences than geography. Inferred episodes of genetic admixture following the main population splits found convincing correlates also in the linguistic realm. Discussion: These results pave the ground for previously unfeasible cross‐disciplinary analyses at the worldwide scale, encompassing populations of distant language families. Am J Phys Anthropol 157:630–640, 2015. © 2015 Wiley Periodicals, Inc.

[1]  John A. Hawkins,et al.  Word order universals , 1983 .

[2]  C. Downes,et al.  Comparison of maternal lineage and biogeographic analyses of ancient and modern Hungarian populations. , 2007, American journal of physical anthropology.

[3]  Simon J. Greenhill Levenshtein Distances Fail to Identify Language Relationships Accurately , 2011, CL.

[4]  M. Steel,et al.  Distributions of Tree Comparison Metrics—Some New Results , 1993 .

[5]  T. Biberauer The limits of syntactic variation , 2008 .

[6]  Andrea Benazzo,et al.  4P: fast computing of population genetics statistics from large DNA polymorphism panels , 2014, Ecology and evolution.

[7]  Lyle Campbell,et al.  Problematic use of Greenberg's linguistic classification of the Americas in studies of Native American genetic variation. , 2004, American journal of human genetics.

[8]  J. Kruskal,et al.  An Indoeuropean classification : a lexicostatistical experiment , 1992 .

[9]  L. Issel-Tarver,et al.  Genes and languages in Europe: an analysis of mitochondrial lineages. , 1995, Genome research.

[10]  A Piazza,et al.  Reconstruction of human evolution: bringing together genetic, archaeological, and linguistic data. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[11]  R. Cann The history and geography of human genes , 1995, The Journal of Asian Studies.

[12]  Comparison of mtDNA haplogroups in Hungarians with four other European populations: a small incidence of descents with Asian origin. , 2007, Acta biologica Hungarica.

[13]  Amit R. Indap,et al.  Genes mirror geography within Europe , 2008, Nature.

[14]  Tandy Warnow,et al.  Indo‐European and Computational Cladistics , 2002 .

[15]  J. Nichols Linguistic Diversity in Space and Time , 1992 .

[16]  Colin Renfrew,et al.  ARCHAEOLOGY, GENETICS AND LINGUISTIC DIVERSITY* , 1992 .

[17]  Mark C. Baker The Atoms of Language , 1987 .

[18]  M. Levandowsky,et al.  Distance between Sets , 1971, Nature.

[19]  D. Gary Miller,et al.  How new languages emerge , 2022 .

[20]  April M. S. McMahon,et al.  Language classification by numbers , 2005 .

[21]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[22]  B. Joseph,et al.  Historical Linguistics , 1999 .

[23]  Noam Chomsky,et al.  Evolution, brain, and the nature of language , 2013, Trends in Cognitive Sciences.

[24]  Simon J. Greenhill,et al.  Language Phylogenies Reveal Expansion Pulses and Pauses in Pacific Settlement , 2009, Science.

[25]  R R Sokal,et al.  Genetic, geographic, and linguistic distances in Europe. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[26]  April McMahon,et al.  Finding Families: Quantitative Methods in Language Classification , 2003 .

[27]  Sébastien Lê,et al.  FactoMineR: An R Package for Multivariate Analysis , 2008 .

[28]  A. Bíró,et al.  A Y-chromosomal comparison of the Madjars (Kazakhstan) and the Magyars (Hungary). , 2009, American journal of physical anthropology.

[29]  M. Crawford,et al.  Paternal Genetic History of the Basque Population of Spain , 2011, Human biology.

[30]  John Novembre,et al.  The Population Reference Sample, POPRES: a resource for population, disease, and pharmacological genetics research. , 2008, American journal of human genetics.

[31]  Cristina Guardiano,et al.  Long-Range Comparison between Genes and Languages Based on Syntactic Distances , 2010, Human Heredity.

[32]  Mark Durie,et al.  The comparative method reviewed : regularity and irregularity in language change , 1997 .

[33]  R R Sokal,et al.  Zones of sharp genetic change in Europe are also linguistic boundaries. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[34]  P. Heggarty Interdisciplinary Indiscipline? Can Phylogenetic Methods Meaningfully Be Applied to Language Data — and to Dating Language? , 2006 .

[35]  Cristina Guardiano,et al.  Evidence for syntax as a signal of historical relatedness , 2009 .

[36]  D. Comas,et al.  Evidence of pre-Roman tribal genetic structure in Basques from uniparentally inherited markers. , 2012, Molecular biology and evolution.

[37]  J. Bertranpetit,et al.  A genome-wide survey does not show the genetic distinctiveness of Basques , 2010, Human Genetics.

[38]  R. Gray,et al.  Language-tree divergence times support the Anatolian theory of Indo-European origin , 2003, Nature.

[39]  R. Mägi,et al.  Genetic Structure of Europeans: A View from the North–East , 2009, PloS one.

[40]  Simon J. Greenhill,et al.  The shape and tempo of language evolution , 2010, Proceedings of the Royal Society B: Biological Sciences.

[41]  Giuseppe Longobardi,et al.  Methods in parametric linguistics and cognitive history , 2003 .

[42]  David Lightfoot How to set parameters , 1991 .

[43]  A. Csősz,et al.  Y‐Chromosome Analysis of Ancient Hungarian and Two Modern Hungarian‐Speaking Populations from the Carpathian Basin , 2008, Annals of human genetics.

[44]  Giuseppe Longobardi Convergence in parametric phylogenies , 2012 .

[45]  N. Mantel The detection of disease clustering and a generalized regression approach. , 1967, Cancer research.

[46]  David H. Alexander,et al.  Fast model-based estimation of ancestry in unrelated individuals. , 2009, Genome research.

[47]  D. F. Roberts,et al.  The History and Geography of Human Genes , 1996 .

[48]  G. Barbujani,et al.  Genetic evidence on origin and dispersal of human populations speaking languages of the Nostratic macrofamily. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Simon J. Greenhill,et al.  Mapping the Origins and Expansion of the Indo-European Language Family , 2012, Science.

[50]  D. Lightfoot The development of language , 1999 .

[51]  R. McMahon,et al.  From phonetic similarity to dialect classification: A principled approach , 2005 .

[52]  Stephen C. Levinson,et al.  Tools from evolutionary biology shed new light on the diversification of languages , 2012, Trends in Cognitive Sciences.

[53]  G. Barbujani,et al.  Worldwide analysis of multiple microsatellites: language diversity has a detectable influence on DNA diversity. , 2007, American journal of physical anthropology.

[54]  August Schleicher,et al.  Die Darwinsche Theorie und die Sprachwissenschaft , 1863 .

[55]  M. Swadesh Lexico-Statistical Dating of Prehistoric Ethnic Contacts , 1952 .

[56]  Cristina Guardiano,et al.  Parametric Comparison and Language Taxonomy , 2005 .

[57]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[58]  Don Ringe The Mathematics of 'Amerind' , 1996 .

[59]  N. Rodríguez‐Ezpeleta,et al.  High-density SNP genotyping detects homogeneity of Spanish and French Basques, and confirms their genomic distinctiveness from other European populations , 2010, Human Genetics.

[60]  Noam Chomsky,et al.  Lectures on Government and Binding , 1981 .

[61]  Cedric Boeckx,et al.  The biolinguistic enterprise : new perspectives on the evolution and nature of the human language faculty , 2011 .

[62]  L. Excoffier,et al.  Human genetic affinities for Y-chromosome P49a,f/TaqI haplotypes show strong correspondence with linguistics. , 1997, American journal of human genetics.

[63]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[64]  Cristina Guardiano,et al.  Toward a syntactic phylogeny of modern Indo-European languages , 2013 .

[65]  C. Boeckx,et al.  Entangled Parametric Hierarchies: Problems for an Overspecified Universal Grammar , 2013, PloS one.

[66]  C. Renfrew,et al.  Archaeology and Language: The Puzzle of Indo-European Origins , 1988, American Antiquity.

[67]  Joseph K. Pickrell,et al.  Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data , 2012, PLoS genetics.

[68]  N. Pierce Origin of Species , 1914, Nature.

[69]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[70]  Joseph H. Greenberg,et al.  Some Universals of Grammar with Particular Reference to the Order of Meaningful Elements , 1990, On Language.

[71]  Robin Clark,et al.  A Computational Model of Language Learnability and Language Change , 2018, Diachronic and Comparative Syntax.

[72]  V. Colonna,et al.  Human genome diversity: frequently asked questions. , 2010, Trends in genetics : TIG.

[73]  B. Weir,et al.  ESTIMATING F‐STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE , 1984, Evolution; international journal of organic evolution.

[74]  Gabriele Rigon,et al.  An Evolutionary Perspective on Diachronic Syntax , 2012 .

[75]  C. Flores,et al.  The place of the Basques in the European Y-chromosome diversity landscape , 2005, European Journal of Human Genetics.

[76]  D. Falush,et al.  A Genetic Atlas of Human Admixture History , 2014, Science.

[77]  Luca Bortolussi,et al.  How many possible languages are there? , 2011, Biology, Computation and Linguistics.

[78]  Noam Chomsky,et al.  The Logical Structure of Linguistic Theory , 1975 .

[79]  Jake K. Byrnes,et al.  Genomic Ancestry of North Africans Supports Back-to-Africa Migrations , 2012, PLoS genetics.