Tutorial on Computational Linguistic Phylogeny

Over the last 10 or more years, there has been a tremendous increase in the use of computational techniques (many of which come directly from biology) for estimating evolutionary histories (i.e., phylogenies) of languages. This tutorial surveys the different methods and different types of linguistic data that have been used to estimate phylogenies, explains the scientific and mathematical foundations of phylogenetic estimation, and presents methodologies for evaluating a phylogeny estimation method.

[1]  P. Colé,et al.  On the relationship between morphological and phonological awareness: Effects of training in kindergarten and in first-grade reading , 2009 .

[2]  Cecil H. Brown,et al.  Automated classification of the world′s languages: a description of the method and preliminary results , 2008 .

[3]  Søren Wichmann,et al.  Typology, Areality, and Diffusion , 2008 .

[4]  Simon J. Greenhill,et al.  The Austronesian Basic Vocabulary Database: From Bioinformatics to Lexomics , 2008, Evolutionary bioinformatics online.

[5]  M. Serva,et al.  Indo-European languages tree by Levenshtein distance , 2007, 0708.2971.

[6]  Simon Musgrave,et al.  Typology and the Linguistic Macrohistory of Island Melanesia , 2007 .

[7]  Angela Terrill,et al.  Statistical Reasoning in the Evaluation of Typological Diversity in Island Melanesia , 2007 .

[8]  Åshild Næss,et al.  An Oceanic Origin for Äiwoo, the Language of the Reef Islands? , 2007 .

[9]  Geoff K. Nicholls,et al.  Dated ancestral trees from binary trait data and their application to the diversification of languages , 2007, 0711.1874.

[10]  M. Pagel,et al.  Frequency of word-use predicts rates of lexical evolution throughout Indo-European history , 2007, Nature.

[11]  Hans J. Holm The new arboretum of Indo-European “trees”. Can new algorithms reveal the phylogeny and even prehistory of Indo-European?* , 2007, J. Quant. Linguistics.

[12]  M. Ross,et al.  Is Kazukuru Really Non-Austronesian? , 2007 .

[13]  Yusuf Sawaki,et al.  Papuan Malay Pronominals: Forms And Functions , 2007 .

[14]  Søren Wichmann,et al.  How to use typological databases in historical linguistic research , 2007 .

[15]  J. Nichols Language dispersal from the Black Sea region , 2007 .

[16]  Daniel Frynta,et al.  Cladistic analysis of Bantu languages: a new tree based on combined lexical and grammatical data , 2006, Naturwissenschaften.

[17]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[18]  Michael Cysouw,et al.  A critique of the separation base method for genealogical subgrouping, with data from mixe-zoquean , 2006, J. Quant. Linguistics.

[19]  Quentin D. Atkinson,et al.  How old is the Indo-European language family? : illumination or more moths to the flame? , 2006 .

[20]  L. Marten Bantu classification, Bantu trees and phylogenetic methods , 2006 .

[21]  Arpiar Saunders,et al.  Linguistic Phylogenetics of the Austronesian Family: A Performance Review of Methods Adapted from Biology , 2006 .

[22]  David Gil,et al.  The World Atlas of Language Structures , 2005 .

[23]  S. Levinson,et al.  Structural Phylogenetics and the Reconstruction of Ancient Language History , 2005, Science.

[24]  April McMahon,et al.  Swadesh sublists and the benefits of borrowing: An Andean case study , 2005 .

[25]  James W. Minett,et al.  Vertical and horizontal transmission in language evolution , 2005 .

[26]  Luay Nakhleh,et al.  A comparison of phylogenetic reconstruction methods on an Indo‐European dataset , 2005 .

[27]  T. Warnow,et al.  Perfect Phylogenetic Networks: A New Methodology for Reconstructing the Evolutionary History of Natural Languages , 2005 .

[28]  April M. S. McMahon,et al.  Language classification by numbers , 2005 .

[29]  T. Warnow,et al.  A STOCHASTIC MODEL OF LANGUAGE EVOLUTION THAT INCORPORATES HOMOPLASY AND BORROWING , 2005 .

[30]  D. Ringe,et al.  Recent Work in Computational Linguistic Phylogeny , 2004 .

[31]  T. Warnow,et al.  Unidentifiable divergence times in rates-across-sites models , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  V. Moulton,et al.  Neighbor-net: an agglomerative method for the construction of phylogenetic networks. , 2002, Molecular biology and evolution.

[33]  Luay Nakhleh,et al.  Phylogenetic networks , 2004 .

[34]  Hans J. Holm The proportionality trap Or: what is wrong with lexicostatistical subgrouping? , 2003, Indogermanische Forschungen.

[35]  R. Gray,et al.  Language-tree divergence times support the Anatolian theory of Indo-European origin , 2003, Nature.

[36]  Vincent Moulton,et al.  Consensus Networks: A Method for Visualising Incompatibilities in Collections of Trees , 2003, WABI.

[37]  P. Forster,et al.  Toward a phylogenetic chronology of ancient Gaulish, Celtic, and Indo-European , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Daniel Frynta,et al.  Cladistic analysis of languages: Indo‐European classification based on lexicostatistical data , 2003 .

[39]  Esra Erdem,et al.  Reconstructing the Evolutionary History of Indo-European Languages Using Answer Set Programming , 2003, PADL.

[40]  James W. Minett,et al.  On detecting borrowing: distance-based and character-based , 2003 .

[41]  Fredric Field,et al.  Linguistic Borrowing in Bilingual Contexts , 2002 .

[42]  Tandy J. Warnow,et al.  Sequence-Length Requirements for Phylogenetic Methods , 2002, WABI.

[43]  C. Holden,et al.  Bantu language trees reflect the spread of farming across sub-Saharan Africa: a maximum-parsimony analysis , 2002, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[44]  Tandy Warnow,et al.  Indo‐European and Computational Cladistics , 2002 .

[45]  David Nash Historical linguistic geography of south-east Western Australia , 2002 .

[46]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[47]  Tandy J. Warnow,et al.  Designing fast converging phylogenetic methods , 2001, ISMB.

[48]  Tandy J. Warnow,et al.  Absolute convergence: true trees from short sequences , 2001, SODA '01.

[49]  J. Huelsenbeck,et al.  MRBAYES : Bayesian inference of phylogeny , 2001 .

[50]  Brett Kessler,et al.  Book Reviews: The Significance of Word Lists , 2001, CL.

[51]  Hans J. Holm Genealogy of the Main Indo-European Branches Applying the Separation Base Method* , 2000, J. Quant. Linguistics.

[52]  Russell D. Gray,et al.  Language trees support the express-train sequence of Austronesian expansion , 2000, Nature.

[53]  Marisa Lohr,et al.  Methods for the genetic classification of languages. , 1999 .

[54]  Tandy J. Warnow,et al.  A Few Logs Suffice to Build (almost) All Trees: Part II , 1999, Theor. Comput. Sci..

[55]  Barry Alpher,et al.  Lexical replacement and cognate equilibrium in Australia , 1999 .

[56]  J. Lynch Linguistic change in southern Melanesia: linguistic aberrancy and genetic distance , 1999 .

[57]  H. Bandelt,et al.  Median-joining networks for inferring intraspecific phylogenies. , 1999, Molecular biology and evolution.

[58]  Michael Mann,et al.  Continuity and divergence in the Bantu languages : perspectives from a lexicostatistic study , 1999 .

[59]  B. Rannala,et al.  Taxon sampling and the accuracy of large phylogenies. , 1998, Systematic biology.

[60]  Daniel H. Huson,et al.  SplitsTree: analyzing and visualizing evolutionary data , 1998, Bioinform..

[61]  Johanna Nichols,et al.  Modeling Ancient Population Structures and Movement in Linguistics , 1997 .

[62]  P. Erdös,et al.  A few logs suffice to build (almost) all trees (l): part I , 1997 .

[63]  P. Kirch The Lapita Peoples: Ancestors of the Oceanic World , 1997 .

[64]  F. Kraus,et al.  The Relationship between s and m and the Retention Index , 1995 .

[65]  R. Blust,et al.  The prehistory of the Austronesian-speaking peoples: A view from language , 1995 .

[66]  Søren Wichmann,et al.  The relationship among the Mixe-Zoquean languages of Mexico , 1995 .

[67]  Ives Goddard,et al.  The West-to-East Cline in Algonquian Dialectology , 1994 .

[68]  A. Pawley,et al.  Austronesian Historical Linguistics and Culture History , 1993 .

[69]  J. Kruskal,et al.  An Indoeuropean classification : a lexicostatistical experiment , 1992 .

[70]  J. Nichols Linguistic Diversity in Space and Time , 1992 .

[71]  M. Steel The complexity of reconstructing trees from qualitative characters and subtrees , 1992 .

[72]  Malcolm Ross,et al.  Proto Oceanic and the Austronesian languages of Western Melanesia , 1991 .

[73]  A. Sherratt,et al.  In Search of the Indo-Europeans; Language, Archaeology and Myth , 1990 .

[74]  C. Renfrew,et al.  Searching for the Origins of Indo-European Languages@@@Archaeology and Language: The Puzzle of Indo-European Origins , 1988 .

[75]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[76]  Sheila Embleton,et al.  Statistics in historical linguistics , 1986 .

[77]  Robert Blust,et al.  The Austronesian Homeland: A Linguistic Perspective , 1985 .

[78]  R. Graham,et al.  The steiner problem in phylogeny is NP-complete , 1982 .

[79]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[80]  W. A. Beyer,et al.  Additive evolutionary trees. , 1977, Journal of theoretical biology.

[81]  R. Sokal,et al.  A QUANTITATIVE APPROACH TO A PROBLEM IN CLASSIFICATION† , 1957, Evolution; International Journal of Organic Evolution.

[82]  M. Swadesh Towards Greater Accuracy in Lexicostatistic Dating , 1955, International Journal of American Linguistics.

[83]  Robert B. Lees,et al.  The Basis of Glottochronology , 1953 .

[84]  M. Haas The Position of Apalachee in the Muskogean Family , 1949, International Journal of American Linguistics.