Automatic Discrimination between Cognates and Borrowings

Identifying the type of relationship between words provides a deeper insight into the history of a language and allows a better characterization of language relatedness. In this paper, we propose a computational approach for discriminating between cognates and borrowings. We show that orthographic features have discriminative power and we analyze the underlying linguistic factors that prove relevant in the classification task. To our knowledge, this is the first attempt of this kind.

[1]  Robert A. Hall,et al.  Linguistics And Your Language , 1960 .

[2]  Dan Klein,et al.  Finding Cognate Groups Using Phylogenies , 2010, ACL.

[3]  Andrea Mulloni,et al.  Automatic Prediction of Cognate Orthography Using Support Vector Machines , 2007, ACL.

[4]  B. Joseph,et al.  Historical Linguistics , 1999 .

[5]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[6]  Graeme Hirst,et al.  Algorithms for language reconstruction , 2002 .

[7]  Bali Ranaivo-Malançon,et al.  Identification of Closely Related Indigenous Languages: An Orthographic Approach , 2009, 2009 International Conference on Asian Language Processing.

[8]  Brett Kessler,et al.  Book Reviews: The Significance of Word Lists , 2001, CL.

[9]  Diana Inkpen,et al.  Automatic Identification of Cognates and False Friends in French and English , 2005 .

[10]  Wilbert Heeringa,et al.  Phonetic and Lexical Predictors of Intelligibility , 2008, Int. J. Humanit. Arts Comput..

[11]  Michael Ashby,et al.  Introducing Phonetic Science , 2005 .

[12]  Ana-Maria Barbu,et al.  Romanian Lexical Data Bases: Inflected and Syllabic Forms Dictionaries , 2008, LREC.

[13]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[14]  April McMahon,et al.  Swadesh sublists and the benefits of borrowing: An Andean case study , 2005 .

[15]  Paul Heggarty Beyond lexicostatistics: How to get more out of `word list' comparisons , 2010 .

[16]  Liviu P. Dinu,et al.  Automatic Detection of Cognates Using Orthographic Alignment , 2014, ACL.

[17]  Iryna Gurevych,et al.  Cognate Production using Character-based Machine Translation , 2013, IJCNLP.

[18]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[19]  Liviu P. Dinu,et al.  Building a Dataset of Multilingual Cognates for the Romanian Lexicon , 2014, LREC.

[20]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[21]  Philipp Koehn,et al.  Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm , 2000, AAAI/IAAI.

[22]  James W. Minett,et al.  On detecting borrowing: distance-based and character-based , 2003 .

[23]  James W. Minett,et al.  Vertical and horizontal transmission in language evolution , 2005 .

[24]  José Gabriel Pereira Lopes,et al.  Measuring Spelling Similarity for Cognate Identification , 2011, EPIA.

[25]  Daniel Marcu,et al.  Cognates Can Improve Statistical Translation Models , 2003, NAACL.

[26]  Lyle Campbell,et al.  Historical Linguistics: An Introduction , 1991 .

[27]  Nello Cristianini,et al.  String Similarity Measures and Pam-like Matrices for Cognate Identification , 2010 .

[28]  G. Nicholls,et al.  FROM WORDS TO DATES: WATER INTO WINE, MATHEMAGIC OR PHYLOGENETIC INFERENCE? , 2005 .

[29]  Amalia Todirascu-Courtier,et al.  Using Cognates in a French-Romanian Lexical Alignment System: A Comparative Study , 2011, RANLP.

[30]  Liviu P. Dinu,et al.  An Etymological Approach to Cross-Language Orthographic Similarity. Application on Romanian , 2014, EMNLP.

[31]  Viktor Pekar,et al.  Automatic Detection of Orthographics Cues for Cognate Recognition , 2006, LREC.