论文信息 - Automatic Detection of Orthographics Cues for Cognate Recognition

Automatic Detection of Orthographics Cues for Cognate Recognition

Present-day machine translation technologies crucially depend on the size and quality of lexical resources. Much of recent research in the area has been concerned with methods to build bilingual dictionaries automatically. In this paper we propose a methodology for the automatic detection of cognates between two languages based solely on the orthography of words. From a set of known cognates, the method induces rules capturing regularities of orthographic mutations that a word undergoes when migrating from one language into the other. The rules are then applied as a preprocessing step before measuring the orthographic similarity between putative cognates. As a result, the method allows to achieve an improvement in the F-measure of 11,86% in comparison with detecting cognates based only on the edit distance between them.

Viktor Pekar | Andrea Mulloni | Andrea Mulloni | Viktor Pekar

[1] Diana Inkpen,et al. Automatic Identification of Cognates and False Friends in French and English , 2005 .

[2] I. Dan Melamed,et al. Bitext Maps and Alignment via Pattern Recognition , 1999, CL.

[3] Michel Simard,et al. Using cognates to align sentences in bilingual corpora , 1993, TMI.

[4] Grzegorz Kondrak,et al. Identification of Confusable Drug Names: A New Approach and Evaluation Methodology , 2004, COLING.

[5] Philipp Koehn,et al. Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm , 2000, AAAI/IAAI.

[6] Pernilla Danielsson,et al. Small but Efficient: The Misconception of High-Frequency Words in Scandinavian Translation , 2000, AMTA.

[7] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[8] David Yarowsky,et al. Multipath Translation Lexicon Induction via Bridge Languages , 2001, NAACL.

[9] Grzegorz Kondrak,et al. Combining Evidence in Cognate Identification , 2004, Canadian AI.