Extracting Bilingual Dictionary from Comparable Corpora with Dependency Heterogeneity

This paper proposes an approach for bilingual dictionary extraction from comparable corpora. The proposed approach is based on the observation that a word and its translation share similar dependency relations. Experimental results using 250 randomly selected translation pairs prove that the proposed approach significantly outperforms the traditional context-based approach that uses bag-of-words around translation candidates.

[1]  Pablo Gamallo Otero Learning bilingual lexicons from comparable English and Spanish corpora , 2007, MTSUMMIT.

[2]  Yuji Matsumoto MaltParser: A language-independent system for data-driven dependency parsing , 2005 .

[3]  Kyo Kageura,et al.  Bilingual Terminology Mining - Using Brain, not brawn comparable corpora , 2007, ACL.

[4]  Sophia Ananiadou,et al.  Developing a Robust Part-of-Speech Tagger for Biomedical Text , 2005, Panhellenic Conference on Informatics.

[5]  Tetsuji Nakagawa,et al.  A Hybrid Approach to Word Segmentation and POS Tagging , 2007, ACL.

[6]  Dekai Wu,et al.  Learning an English-Chinese Lexicon from a Parallel Corpus , 1994, AMTA.

[7]  Jon Oberlander,et al.  IN PROCEEDINGS OF EACL-2006 , 2006 .

[8]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[9]  Satoshi Sato,et al.  Compiling French-Japanese Terminologies from the Web , 2006, EACL.

[10]  Gregory Grefenstette,et al.  Cross-Language Information Retrieval , 1998, The Springer International Series on Information Retrieval.

[11]  Gregory Grefenstette The Problem of Cross-Language Information Retrieval , 1998 .

[12]  Pablo Gamallo Evaluating Two Different Methods for the Task of Extracting Bilingual Lexicons from Comparable Corpora , 2008 .

[13]  Joakim Nivre,et al.  MaltParser: A Language-Independent System for Data-Driven Dependency Parsing , 2007, Natural Language Engineering.

[14]  Pierre Zweigenbaum,et al.  Looking for Candidate Translational Equivalents in Specialized, Comparable Corpora , 2002, COLING.

[15]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[16]  Pascale Fung,et al.  Compiling Bilingual Lexicon Entries From a Non-Parallel English-Chinese Corpus , 1995, VLC@ACL.

[17]  Emmanuel Morin,et al.  An Effective Compositional Model for Lexical Alignment , 2008, IJCNLP.

[18]  Pascale Fung,et al.  A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora , 1998, AMTA.