Learning Spanish-Galician Translation Equivalents Using a Comparable Corpus and a Bilingual Dictionary

So far, research on extraction of translation equivalents from comparable, non-parallel corpora has not been very popular. The main reason was the poor results when compared to those obtained from aligned parallel corpora. The method proposed in this paper, relying on seed patterns generated from external bilingual dictionaries, allows us to achieve similar results to those from parallel corpus. In this way, the huge amount of comparable corpora available via Web can be viewed as a never-ending source of lexicographic information. In this paper, we describe the experiments performed on a comparable, Spanish-Galician corpus.

[1]  Jörg Tiedemann Extraction of Translation Equivalents from Parallel Corpora , 1998, NODALIDA.

[2]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[3]  Oi Yee Kwong,et al.  Alignment and Extraction of Bilingual Legal Terminology from Context Profiles , 2002, COLING 2002.

[4]  J. Katz,et al.  The philosophy of linguistics , 1989 .

[5]  José Ramom Pichel Campos,et al.  An Approach to Acquire Word Translations from Non-parallel Texts , 2005, EPIA.

[6]  Éric Gaussier,et al.  Bilingual terminology extraction : an approach based on a multilingual thesaurus applicable to comparable corpora , 2002 .

[7]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.

[8]  Kenneth Ward Church,et al.  Identifying Word Correspondences in Parallel Texts , 1991, HLT.

[9]  Pablo Gamallo Otero Learning bilingual lexicons from comparable English and Spanish corpora , 2007, MTSUMMIT.

[10]  Magnus Merkel,et al.  A Simple Hybrid Aligner for Generating Lexical Correspondences in Parallel Texts , 1998, ACL.

[11]  I. Dan Melamed A portable algorithm for mapping bitext correspondence , 1997 .

[12]  Hiroshi Nakagawa Disambiguation of single noun translations extracted from bilingual comparable corpora , 2001 .

[13]  Hwee Tou Ng,et al.  Mining New Word Translations from Comparable Corpora , 2004, COLING.

[14]  Pascale Fung,et al.  An IR Approach for Translating New Words from Nonparallel, Comparable Texts , 1998, ACL.

[15]  Pascale Fung,et al.  Finding Terminology Translations from Non-parallel Corpora , 1997, VLC.

[16]  Kenneth Ward Church,et al.  Identifying word correspondence in parallel texts , 1991 .

[17]  Mikel L. Forcada,et al.  Open-Source Portuguese-Spanish Machine Translation , 2006, PROPOR.

[18]  Pablo Gamallo,et al.  Clustering Syntactic Positions with Similar Semantic Requirements , 2005, CL.

[19]  José Gabriel Pereira Lopes,et al.  Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units , 1999, EPIA.

[20]  Xavier Carreras,et al.  FreeLing: An Open-Source Suite of Language Analyzers , 2004, LREC.

[21]  Oi Yee Kwong,et al.  Alignment and extraction of bilingual legal terminology from context profiles , 2004 .

[22]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[23]  Takaaki Tanaka Measuring the Similarity between Compound Nouns in Different Languages Using Non-Parallel Corpora , 2002, COLING.

[24]  A. Campbell,et al.  Progress in Artificial Intelligence , 1995, Lecture Notes in Computer Science.

[25]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[26]  Pierre Zweigenbaum,et al.  Looking for Candidate Translational Equivalents in Specialized, Comparable Corpora , 2002, COLING.