Learning bilingual lexicons from comparable English and Spanish corpora

Research on extraction of word translation from comparable, non­parallel texts has not been very popular because it produces poor results when compared to those obtained from aligned parallel corpora. Whereas for parallel texts, word translation extraction can reach about 99%, the accuracy for comparable corpora has been around 72% up to now. The current approach, which relies not on a bilingual dictionary but on the previous extraction of bilingual information from parallel corpora, makes a significant improvement to about 79% of words translations identified correctly.

[1]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[2]  Pascale Fung,et al.  Compiling Bilingual Lexicon Entries From a Non-Parallel English-Chinese Corpus , 1995, VLC@ACL.

[3]  Reinhard Rapp,et al.  Identifying Word Translations in Non-Parallel Texts , 1995, ACL.

[4]  Pascale Fung,et al.  A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora , 1995, ACL.

[5]  I. Dan Melamed A Word-to-Word Model of Translational Equivalence , 1997, ACL.

[6]  Pascale Fung,et al.  Finding Terminology Translations from Non-parallel Corpora , 1997, VLC.

[7]  I. Dan Melamed A portable algorithm for mapping bitext correspondence , 1997 .

[8]  Jörg Tiedemann Extraction of Translation Equivalents from Parallel Corpora , 1998, NODALIDA.

[9]  Pascale Fung,et al.  An IR Approach for Translating New Words from Nonparallel, Comparable Texts , 1998, ACL.

[10]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[11]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.

[12]  Mona T. Diab,et al.  A statistical word-level translation model for comparable corpora , 2000 .

[13]  Špela Vintar,et al.  Using parallel corpora for translation-oriented term extraction , 2000 .

[14]  Éric Gaussier,et al.  Bilingual terminology extraction : an approach based on a multilingual thesaurus applicable to comparable corpora , 2002 .

[15]  Pierre Zweigenbaum,et al.  Looking for Candidate Translational Equivalents in Specialized, Comparable Corpora , 2002, COLING.

[16]  Takaaki Tanaka Measuring the Similarity between Compound Nouns in Different Languages Using Non-Parallel Corpora , 2002, COLING.

[17]  Xavier Gómez Guinovart,et al.  Métodos de optimización de la extracción de léxico bilinge a partir de corpus paralelos , 2004, Proces. del Leng. Natural.

[18]  Xavier Carreras,et al.  FreeLing: An Open-Source Suite of Language Analyzers , 2004, LREC.

[19]  Pablo Gamallo,et al.  Clustering Syntactic Positions with Similar Semantic Requirements , 2005, CL.

[20]  Pablo Gamallo Otero Extraction of Translation Equivalents from Parallel Corpora Using Sense-sensitive Contexts , 2005 .

[21]  José Ramom Pichel Campos,et al.  An Approach to Acquire Word Translations from Non-parallel Texts , 2005, EPIA.