Bayesian Identification of Cognates and Correspondences

This paper presents a Bayesian approach to comparing languages: identifying cognates and the regular correspondences that compose them. A simple model of language is extended to include these notions in an account of parent languages. An expression is developed for the posterior probability of child language forms given a parent language. Bayes' Theorem offers a schema for evaluating choices of cognates and correspondences to explain semantically matched data. An implementation optimising this value with gradient descent is shown to distinguish cognates from non-cognates in data from Polish and Russian.

[1]  R. Gray,et al.  Language-tree divergence times support the Anatolian theory of Indo-European origin , 2003, Nature.

[2]  Jacques B. M. Guy An Algorithm for Identifying Cognates in Bilingual Wordlists and its Applicability to Machine Translation , 1994, J. Quant. Linguistics.

[3]  I. Lehiste,et al.  Principles and Methods for Historical Linguistics , 1979 .

[4]  Graeme Hirst,et al.  Algorithms for language reconstruction , 2002 .

[5]  I. Dan Melamed,et al.  Models of translation equivalence among words , 2000, CL.

[6]  Luay Nakhleh,et al.  A comparison of phylogenetic reconstruction methods on an Indo‐European dataset , 2005 .

[7]  April McMahon,et al.  Finding Families: Quantitative Methods in Language Classification , 2003 .

[8]  Simon Kirby,et al.  Measuring Language Divergence by Intra-Lexical Comparison , 2006, ACL.

[9]  Jacques B. M. Guy An Algorithm For Identifying Cognates Between Related Languages , 1984, COLING.

[10]  Tandy Warnow,et al.  Indo‐European and Computational Cladistics , 2002 .

[11]  Brett Kessler,et al.  Book Reviews: The Significance of Word Lists , 2001, CL.

[12]  Martin Kay THE LOGIC OF COGNATE RECOGNITION IN HISTORICAL LINGUISTICS , 1964 .

[13]  Simon Kirby,et al.  Natural Language From Artificial Life , 2002, Artificial Life.

[14]  J. Kruskal,et al.  An Indoeuropean classification : a lexicostatistical experiment , 1992 .

[15]  Donald G. Frantz A PL/1 program to assist the comparative linguist , 1970, CACM.