Automated Dating of the World’s Language Families Based on Lexical Similarity

This paper describes a computerized alternative to glottochronology for estimating elapsed time since parent languages diverged into daughter languages. The method, developed by the Automated Similarity Judgment Program (ASJP) consortium, is different from glottochronology in four major respects: (1) it is automated and thus is more objective, (2) it applies a uniform analytical approach to a single database of worldwide languages, (3) it is based on lexical similarity as determined from Levenshtein (edit) distances rather than on cognate percentages, and (4) it provides a formula for date calculation that mathematically recognizes the lexical heterogeneity of individual languages, including parent languages just before their breakup into daughter languages. Automated judgments of lexical similarity for groups of related languages are calibrated with historical, epigraphic, and archaeological divergence dates for 52 language groups. The discrepancies between estimated and calibration dates are found to be on average 29% as large as the estimated dates themselves, a figure that does not differ significantly among language families. As a resource for further research that may require dates of known level of accuracy, we offer a list of ASJP time depths for nearly all the world’s recognized language families and for many subfamilies.

[1]  Vittorio Loreto,et al.  On the Accuracy of Language Trees , 2011, PloS one.

[2]  R. Gray,et al.  Language-tree divergence times support the Anatolian theory of Indo-European origin , 2003, Nature.

[3]  Alaric Watson,et al.  Aurelian and the third century , 1999 .

[4]  Peter B. Golden,et al.  The Turkic Peoples: A Historical Sketch , 1998 .

[5]  W. Heeringa,et al.  Evaluation of String Distance Algorithms for Dialectology , 2006 .

[6]  Malcolm Ross,et al.  Pronouns as a preliminary diagnostic for grouping Papuan languages , 2005 .

[7]  Tom Güldemann,et al.  Proto-Bantu and Proto-Niger-Congo: macro-areal typology and linguistic reconstruction , 2011 .

[8]  Tom Güldemann,et al.  Changing profile when encroaching on hunter-gatherer territory : towards a history of the Khoe-Kwadi family in southern Africa , 2013 .

[9]  Russell D. Gray,et al.  Rapid radiation, borrowing and dialect continua in the Bantu languages , 2006 .

[10]  M. Serva,et al.  Indo-European languages tree by Levenshtein distance , 2007, 0708.2971.

[11]  Andrew Pawley,et al.  The role of the Solomon Islands in the first settlement of Remote Oceania: Bringing linguistic evidence to an archaeological debate , 2009 .

[12]  R. Dixon The rise and fall of languages , 1997 .

[13]  Simon J. Greenhill,et al.  The Austronesian Basic Vocabulary Database: From Bioinformatics to Lexomics , 2008, Evolutionary bioinformatics online.

[14]  Peter Bellwood Southeast China and the Prehistory of the Austronesians , 2007 .

[15]  Cecil H. Brown,et al.  Adding typology to lexicostatistics: A combined approach to language classification , 2009 .

[16]  Huang Xin,et al.  Languages of China , 2008 .

[17]  Roger Blench,et al.  Archaeology, Language, and the African Past , 2006 .

[18]  Loa P. Traxler,et al.  The Ancient Maya , 1947 .

[19]  R. Bremmer An Introduction to Old Frisian: History, Grammar, Reader, Glossary , 2009 .

[20]  Gerald Stone,et al.  The smallest Slavonic nation: the Sorbs of Lusatia , 1972 .

[21]  D. Hymes,et al.  Lexicostatistics So Far , 1960, Current Anthropology.

[22]  Tsang Cheng-hwa,et al.  RECENT DISCOVERIES AT THE TAPENKENG CULTURE SITES IN TAIWAN: Implications for the problem of Austronesian origins , 2005 .

[23]  M. Fortescue,et al.  Language relations across Bering Strait : reappraising the archaeological and linguistic evidence , 1998 .

[24]  D. Barnes,et al.  The Languages of China , 1989 .

[25]  E. Haugen Scandinavian Language Structures: A Comparative Historical Survey , 1982 .

[26]  Peter Hiscock,et al.  Australia and the Austronesians , 2005 .

[27]  Kropp Dakubu,et al.  West African language data sheets , 1977 .

[28]  Starostin George,et al.  Preliminary lexicostatistics as a basis for language classification: a new approach , 2010 .

[29]  Tom Güldemann,et al.  Sprachraum and geography: Linguistic macro-areas in Africa , 2010 .

[30]  Paul Proulx,et al.  Time depth in historical linguistics , 2004 .

[31]  Eric W. Holman,et al.  Evaluating linguistic distance measures , 2010 .

[32]  Andrew Pawley,et al.  Trans New Guinea Languages , 2006 .

[33]  Juha Janhunen,et al.  The Mongolic Languages , 2006 .

[34]  Edward Sapir,et al.  Time Perspective in Aboriginal American Culture: A Study in Method , 2008 .

[35]  Michael Mann,et al.  Continuity and divergence in the Bantu languages : perspectives from a lexicostatistic study , 1999 .

[36]  Edward J. Vajda,et al.  From Ancient Cham to Modern Dialects: Two Thousand Years of Language Contact and Change (review) , 2004 .

[37]  Johanna Nichols Language families, macroareas, and contact , 2010 .

[38]  M. Swadesh Towards Greater Accuracy in Lexicostatistic Dating , 1955, International Journal of American Linguistics.

[39]  Laurent Sagart,et al.  The peopling of East Asia : putting together archaeology, linguistics and genetics , 2005 .

[40]  Terrence Kaufman,et al.  Archaeological and linguistic correlations in Mayaland and associated areas of Meso‐America , 1976 .

[41]  Cecil H. Brown Glottochronology and the Chronology of Maize in the Americas , 2006 .

[42]  Simon J. Greenhill,et al.  Language evolution and human history: what a difference a date makes , 2011, Philosophical Transactions of the Royal Society B: Biological Sciences.

[44]  Yaron Matras,et al.  Romani: A Linguistic Introduction , 2002 .

[45]  Clare Holden,et al.  Comparison of maximum parsimony and Bayesian Bantu language trees , 2005 .

[46]  April McMahon,et al.  Why linguists don’t do dates , 2006 .

[47]  Deryle Lonsdale,et al.  Positing Language Relationships Using ALINE , 2011 .

[48]  Alexander Adelaar,et al.  The Indonesian migrations to Madagascar: making sense of the multidisciplinary evidence , 2005 .

[49]  Eusebio Z Dizon,et al.  Austronesian cultural origins: Out of Taiwan, via the Batanes Islands, and onwards to Western Polynesia , 2008 .

[50]  Simon J. Greenhill,et al.  How Accurate and Robust Are the Phylogenetic Estimates of Austronesian Language Relationships? , 2010, PloS one.

[51]  Mark Harvey,et al.  The Genetic Status of Garrwan , 2009 .

[52]  Ante Aikio,et al.  On Germanic-Saami contacts and Saami prehistory , 2006 .

[53]  J. B. Bury,et al.  History of the later Roman empire : from the death of Theodosius I. to the death of Justinian (A.D. 395 to A.D. 565) , 1923 .

[54]  Isidore Dyen,et al.  Malgache et maanjan: Une comparaison linguistique , 1953 .

[55]  Simon J. Greenhill,et al.  Austronesian language phylogenies: myths and misconceptions about Bayesian computational methods , 2009 .

[56]  Sheila Embleton,et al.  Statistics in historical linguistics , 1986 .

[57]  Joseph E. Grimes,et al.  Linguistic Divergence in Romance , 1959 .

[58]  Tandy J. Warnow,et al.  Tutorial on Computational Linguistic Phylogeny , 2008, Lang. Linguistics Compass.

[59]  C. Holden,et al.  Bantu language trees reflect the spread of farming across sub-Saharan Africa: a maximum-parsimony analysis , 2002, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[60]  Malcolm Ross,et al.  Proto Oceanic and the Austronesian languages of Western Melanesia , 1991 .

[61]  Carmel Vassallo Book Review: The Maltese Cross: A Strategic History of Malta , 2007 .

[62]  Koen Bostoen,et al.  La question bantoue: bilan et perspectives , 2007 .

[63]  Knut Bergsland,et al.  On the Validity of Glottochronology , 1962, Current Anthropology.

[64]  Brett Kessler,et al.  Computational dialectology in Irish Gaelic , 1995, EACL.

[65]  M. Swadesh Salish Internal Relationships , 1950, International Journal of American Linguistics.

[66]  Harald Hammarström A full-scale test of the language farming dispersal hypothesis , 2010 .

[67]  Rudolph C. Troike The Glottochronology of Six Turkic Languages , 1969, International Journal of American Linguistics.

[68]  Åshild Næss,et al.  An Oceanic Origin for Äiwoo, the Language of the Reef Islands? , 2007 .

[69]  Robert B. Lees,et al.  The Basis of Glottochronology , 1953 .

[70]  Cecil H. Brown,et al.  Automated classification of the world′s languages: a description of the method and preliminary results , 2008 .

[71]  Robert. Rankin,et al.  Siouan Tribal Contacts and dispersions evidenced in the terminology for maize and other cultigens , 2006 .

[72]  R. Gray,et al.  Are Accurate Dates an Intractable Problem for Historical Linguistics , 2006 .

[73]  Laurent Sagart,et al.  The roots of old Chinese , 1999 .

[74]  Daniel Frynta,et al.  Cladistic analysis of Bantu languages: a new tree based on combined lexical and grammatical data , 2006, Naturwissenschaften.

[75]  Paul Heggarty,et al.  Linguistics for Archaeologists: Principles, Methods and the Case of the Incas , 2007, Cambridge Archaeological Journal.

[76]  J. Dufrénoy La relation entre la distance spatiale et la distance lexicale , 1972 .

[77]  Søren Wichmann,et al.  Mayan Historical Linguistics and Epigraphy: A New Synthesis , 2006 .

[78]  Joel Waldfogel,et al.  Introduction , 2010, Inf. Econ. Policy.

[79]  András Róna-Tas,et al.  An introduction to Turkology , 1991 .

[80]  Grzegorz Kondrak,et al.  A New Algorithm for the Alignment of Phonetic Sequences , 2000, ANLP.

[81]  Søren Wichmann,et al.  Explorations in automated language classification , 2008 .

[82]  J. P. Brochado,et al.  An ecological model of the spread of pottery and agriculture into Eastern South America , 1988 .

[83]  David W. Anthony,et al.  The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World , 2008 .

[84]  William C. Sturtevant,et al.  Handbook of North American Indians , 1978 .

[85]  R. Blust,et al.  From Ancient Cham to Modern Dialects: Two Thousand Years of Language Contact and Change (review) , 2000 .

[86]  David C. Conrad Arabic Medieval Inscriptions from the Republic of Mali: Epigraphy, Chronicles, and Songhay-Tuareg History , 2005 .

[87]  David W. Anthony,et al.  Horse, wagon & chariot: Indo-European languages and archaeology , 1995, Antiquity.

[88]  W. Dietrich,et al.  More evidence for an internal classification of Tupi-Guarani languages , 1990 .

[89]  J.,et al.  A New Historical Grammar of the East Slavic Languages , 2008 .

[90]  S. Mufwene The Ecology of Language Evolution. Cambridge Approaches to Language Contact. , 2001 .

[91]  Barry Alpher Pama-Nyungan: phonological reconstruction and status as a phylogenetic group , 2004 .

[92]  Henning Andersen,et al.  The Dawn of Slavic: An Introduction to Slavic Philology , 1999 .

[93]  Ken Hale,et al.  The coherence and distinctiveness of the Pama-Nyungan language family within the Australian linguistic phylum , 2004 .