On detecting borrowing: distance-based and character-based

Two computational methods for detecting borrowing among a family of genetically related languages are proposed. One method, based on the detection of branches with negative length in lexicostatistical trees, is shown to work poorly. As we demonstrate, this method is similar to another recently proposed method for detecting borrowing based on skewing in lexicostatistical data. A second method, using character-based classification techniques in common use in the classification of biological taxa, is shown to be more effective. This method allows borrowed characters and the languages among which the borrowing may have taken place to be identified — in some cases, the most likely direction of the borrowing can also be specified.

[1]  Ronald L. Wasserstein,et al.  Monte Carlo: Concepts, Algorithms, and Applications , 1997 .

[2]  J. Farris Estimating Phylogenetic Trees from Distance Matrices , 1972, The American Naturalist.

[3]  D. Swofford When are phylogeny estimates from molecular and morphological data incongruent , 1991 .

[4]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[5]  M. Swadesh Diffusional Cumulation and Archaic Residue as Historical Explanations , 1951, Southwestern Journal of Anthropology.

[6]  M. Swadesh Towards Greater Accuracy in Lexicostatistic Dating , 1955, International Journal of American Linguistics.

[7]  J. Felsenstein Numerical Methods for Inferring Evolutionary Trees , 1982, The Quarterly Review of Biology.

[8]  T. Warnow Mathematical approaches to comparative linguistics. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Robert B. Lees,et al.  The Basis of Glottochronology , 1953 .

[10]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[11]  Sheila Embleton,et al.  Statistics in historical linguistics , 1986 .

[12]  Tandy J. Warnow,et al.  Reconstructing the evolutionary history of natural languages , 1996, SODA '96.

[13]  Joseph H. Greenberg,et al.  Language in the Americas , 1987 .

[14]  William S-Y. Wang,et al.  Spatial distance and lexical replacement , 1986 .

[15]  M. Swadesh Salish Internal Relationships , 1950, International Journal of American Linguistics.

[16]  J. Kruskal,et al.  An Indoeuropean classification : a lexicostatistical experiment , 1992 .

[17]  I. Kitching Cladistics: The Theory and Practice of Parsimony Analysis , 1998 .

[18]  Mark Durie,et al.  The comparative method reviewed : regularity and irregularity in language change , 1997 .

[19]  W. Hennig Grundzüge einer Theorie der phylogenetischen Systematik , 1950 .

[20]  B. Joseph,et al.  Historical Linguistics , 1999 .

[21]  R. Cann The history and geography of human genes , 1995, The Journal of Asian Studies.

[22]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[23]  D. F. Roberts,et al.  The History and Geography of Human Genes , 1996 .

[24]  Fred W. Householder,et al.  Validity of Glottochronology , 1964, Current Anthropology.

[25]  Walter L. Smith Probability and Statistics , 1959, Nature.

[26]  M. Swadesh Lexico-Statistical Dating of Prehistoric Ethnic Contacts , 1952 .

[27]  Sanzheng Qiao,et al.  Evaluating Phylogenetic Trees by Matrix Decomposition , 1998 .

[28]  Lyle Campbell,et al.  Historical Linguistics: An Introduction , 1991 .

[29]  J. Farris,et al.  Quantitative Phyletics and the Evolution of Anurans , 1969 .