Levenshtein’s Distance for Measuring Lexical Evolution Rates

The relationships between languages molded by extremely complex social, cultural and political factors are assessed by an automated method, in which the distance between languages is estimated by the average normalized Levenshtein distance between words from the list of 200 meanings maximally resistant to change. A sequential process of language classification described by random walks on the matrix of lexical distances allows to represent complex relationships between languages geometrically, in terms of distances and angles. We have tested the method on a sample of 50 Indo-European and 50 Austronesian languages. The geometric representations of language taxonomy allow for making accurate interfaces on the most significant events of human history by tracing changes in language families through time. The Anatolian and Kurgan hypothesis of the Indo-European origin and the “express train” model of the Polynesian origin are thoroughly discussed.

[1]  M. Elphinstone The History of India , 2013 .

[2]  Thomas V. Gamkrelidze,et al.  The Early History of Indo-European Languages , 1990 .

[3]  Philip Baldi,et al.  The foundations of Latin , 1999 .

[4]  J. Mallory In Search of the Indo-Europeans / Language, Archaeology and Myth , 1992 .

[5]  Ph. Blanchard,et al.  Intelligibility and first passage times in complex urban networks , 2007, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[6]  P. Kirch On the Road of the Winds: An Archaeological History of the Pacific Islands before European Contact , 2017 .

[7]  M. Hurles,et al.  Untangling Oceanic settlement: the edge of the knowable , 2003 .

[8]  April McMahon,et al.  Swadesh sublists and the benefits of borrowing: An Andean case study , 2005 .

[9]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[10]  Filippo Petroni,et al.  Language distance and tree reconstruction , 2008 .

[11]  Jonathan Scott Friedlaender,et al.  The Genetic Structure of Pacific Islanders , 2008, PLoS genetics.

[12]  I. Jolliffe Principal Component Analysis , 2002 .

[13]  Simon J. Greenhill,et al.  The Austronesian Basic Vocabulary Database: From Bioinformatics to Lexomics , 2008, Evolutionary bioinformatics online.

[14]  P. Green The Greco-Persian Wars , 1996 .

[15]  Filippo Petroni,et al.  Malagasy dialects and the peopling of Madagascar , 2011, Journal of The Royal Society Interface.

[16]  M. Pagel,et al.  Frequency of word-use predicts rates of lexical evolution throughout Indo-European history , 2007, Nature.

[17]  Edwin F. Bryant The quest for the origins of Vedic culture : the Indo-Aryan migration debate , 2002 .

[18]  D. Abulafia,et al.  The New Cambridge Medieval History , 2015 .

[19]  E. Polomé The Indo-Europeans in the fourth and third millennia , 1982 .

[20]  L. Jin,et al.  Polynesian origins: insights from the Y chromosome. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[21]  C. Renfrew,et al.  Archaeology and Language: The Puzzle of Indo-European Origins , 1988, American Antiquity.

[22]  Filippo Petroni,et al.  Automated Word Stability and Language Phylogeny* , 2011, J. Quant. Linguistics.

[23]  Jared M. Diamond,et al.  Express train to Polynesia , 1988, Nature.

[24]  L. Jorde,et al.  Affinities among Melanesians, Micronesians, and Polynesians: A Neutral, Biparental Genetic Perspective , 2002, Human biology.

[25]  Sheila Embleton,et al.  Statistics in historical linguistics , 1986 .

[26]  P. Kirch The Lapita Peoples: Ancestors of the Oceanic World , 1997 .

[27]  Y. Sinoto,et al.  New Radiocarbon Ages of Colonization Sites in East Polynesia , 2002 .

[28]  Dimitry Volchenkov,et al.  Random walks and flights over connected graphs and complex networks , 2011 .

[29]  E. Matisoo-Smith,et al.  Origins and dispersals of Pacific peoples: Evidence from mtDNA phylogenies of the Pacific rat , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Thomas V. Gamkrelidze,et al.  Indo-European and the Indo-Europeans : a reconstruction and historical analysis of a proto-language and a proto-culture , 1997 .

[31]  Filippo Petroni,et al.  Geometric representations of language taxonomies , 2009, Comput. Speech Lang..

[32]  M. Serva,et al.  Indo-European languages tree by Levenshtein distance , 2007, 0708.2971.

[33]  Colin Renfrew,et al.  Archaeology and language , 1987 .

[34]  E. Oja,et al.  Independent Component Analysis , 2013 .

[35]  Tandy J. Warnow,et al.  Tutorial on Computational Linguistic Phylogeny , 2008, Lang. Linguistics Compass.

[36]  P. Bellwood,et al.  ‘Lapita colonists leave boats unburned!’ The question of Lapita links with Island Southeast Asia , 1989, Antiquity.

[37]  Filippo Petroni,et al.  Lexical evolution rates by automated stability measure , 2009, ArXiv.

[38]  P. Underhill,et al.  Melanesian and Asian origins of Polynesians: mtDNA and Y chromosome gradients across the Pacific. , 2006, Molecular biology and evolution.

[39]  P. Forster,et al.  Phylogenetic Methods and the Prehistory of Languages , 2006 .

[40]  M. Hurles,et al.  The dual origin of the Malagasy in Island Southeast Asia and East Africa: evidence from maternal and paternal lineages. , 2005, American journal of human genetics.

[41]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[42]  E. Matisoo-Smith,et al.  Phylogeny and ancient DNA of Sus provides insights into neolithic expansion in Island Southeast Asia and Oceania , 2007, Proceedings of the National Academy of Sciences.

[43]  P. Blanchard,et al.  Mathematical Analysis of Urban Spatial Networks , 2008 .