Using machine learning methods to avoid the pitfall of cognates and false friends in Spanish-Portuguese word pairs

The fact that 85% of the Portuguese lexicon contains Spanish cognates and that the linguistic structures of both languages are highly coincident is believed to be an advantage for the Spanish speaker who learns Portuguese. However, these similarities have some negative aspects in the learning of Portuguese, such as, the pitfall of false friends, since about 20% of cognates are false. The aim of this article is to identify cognates and false friends between Spanish and Portuguese automatically to build dictionaries of these words. One of the uses for these dictionaries is to support scientific writing tools, which can help lower barriers for Spanish speakers when they write in Portuguese.

[1]  Eunice R. Henriques,et al.  Intercompreensão de texto escrito por falantes nativos de português e de espanhol , 2000 .

[2]  Escola Politécnica,et al.  DESENVOLVIMENTO DE SISTEMA PARA CONVERSÃO DE TEXTOS EM FONEMAS NO IDIOMA PORTUGUÊS , 1994 .

[3]  Kerong Ben,et al.  Software Metrics Reduction for Fault-Proneness Prediction of Software Modules , 2010, NPC.

[4]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[5]  Luís Gomes,et al.  Parallel texts alignment , 2009 .

[6]  Grzegorz Kondrak,et al.  Identification of Confusable Drug Names: A New Approach and Evaluation Methodology , 2004, COLING.

[7]  Michel Simard,et al.  Using cognates to align sentences in bilingual corpora , 1993, TMI.

[8]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[9]  Grzegorz Kondrak,et al.  Identifying Cognates by Phonetic and Semantic Similarity , 2001, NAACL.

[10]  Diana Inkpen,et al.  Automatic Identification of Cognates and False Friends in French and English , 2005 .

[11]  Grzegorz Kondrak,et al.  Computing Word Similarity and Identifying Cognates with Pair Hidden Markov Models , 2005, CoNLL.

[12]  Peter Reutemann,et al.  WEKA Manual for Version 3-6-10 , 2008 .

[13]  I. Dan Melamed,et al.  Bitext Maps and Alignment via Pattern Recognition , 1999, CL.

[14]  Diana Inkpen,et al.  Identification and Disambiguation of Cognates, False Friends, and Partial Cognates Using Machine Learning Techniques , 2010 .

[15]  Andrea Mulloni,et al.  Semantic Evidence for Automatic Identification of Cognates , 2007 .

[16]  Grzegorz Kondrak Cognates and Word Alignment in Bitexts , 2005, MTSUMMIT.

[17]  David Yarowsky,et al.  Multipath Translation Lexicon Induction via Bridge Languages , 2001, NAACL.