Beautiful Trees on Unstable Ground

While lexicostatistics and glottochronology were believed to be dead for a long time, the integration of stochastic methods taken from genetics has initiated an unexpected revival of these scorned disciplines. The proponents of these ’new quantitative methods’ in historical linguistics claim that the procedures are relatively robust regarding errors in the data (wrong cognate judgments, undetected borrowings or wrong translations). In order to check this claim, we have investigated the differences and errors in two large lexicostatistical datasets and tested their influence on the topologies of computed family trees. Our results show clearly that the shortcomings of lexicostatistics and glottochronology have not been overcome by these new computation methods: the main problems of lexicostatistics and glottochronology, the translation of basic concepts into individual languages and the execution of cognate judgments, are still so grave that no reliable results can be drawn from this methods.

[1]  Concepción Fernández Martínez,et al.  Historische Laut- und Formenlehre der lateinischen Sprache , 2001 .

[2]  Quentin D. Atkinson,et al.  How old is the Indo-European language family? : illumination or more moths to the flame? , 2006 .

[3]  Sarah C. Gudschinsky The ABC'S of Lexicostatistics (Glottochronology) , 1956 .

[4]  James A. Matisoff Variational semantics in Tibeto-Burman : the "organic" approach to linguistic comparison , 1980 .

[5]  Simon J. Greenhill,et al.  The Austronesian Basic Vocabulary Database: From Bioinformatics to Lexomics , 2008, Evolutionary bioinformatics online.

[6]  Isidore Dyen An Indoeuropean classification , 1992 .

[7]  R. Gray,et al.  Language-tree divergence times support the Anatolian theory of Indo-European origin , 2003, Nature.

[8]  April M. S. McMahon,et al.  Language classification by numbers , 2005 .

[9]  J. Kruskal,et al.  An Indoeuropean classification : a lexicostatistical experiment , 1992 .

[10]  M. Swadesh Towards Greater Accuracy in Lexicostatistic Dating , 1955, International Journal of American Linguistics.

[11]  Hans J. Holm The new arboretum of Indo-European “trees”. Can new algorithms reveal the phylogeny and even prehistory of Indo-European?* , 2007, J. Quant. Linguistics.

[12]  James A. Matisoff Variational Semantics In Tibeto-Burman , 1978 .

[13]  M. Swadesh Lexico-Statistical Dating of Prehistoric Ethnic Contacts , 1952 .

[14]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[15]  James O. McInerney,et al.  TOPD/FMTS: a new software to compare phylogenetic trees , 2007, Bioinform..

[16]  Fred W. Householder,et al.  Validity of Glottochronology , 1964, Current Anthropology.

[17]  Alvar Ellegard Statistical Measurement of Linguistic Relationship , 1959 .

[18]  Ekaterina Chirkova Review of Wang Feng (2006). Comparison of languages in contact: The distillation method and the case of Bai. , 2007 .

[19]  Harry Hoijer,et al.  Lexicostatistics: A Critique , 1956 .

[20]  L. Sagart,et al.  No limits to borrowing: The case of Bai and Chinese , 2008 .

[21]  J. Tischler Glottochronologie und Lexikostatistik , 1973 .

[22]  Sylvie Vanseveren,et al.  Compte rendu de Gerhard Meiser, Historische Laut- und Formenlehre der lateinischen Sprache. Darmstadt, Wissenschaftliche Buchgesellschaft, 1998 , 2000 .