Using WordNet and Semantic Similarity for Bilingual Terminology Mining from Comparable Corpora

This paper presents an extension of the standard approach used for bilingual lexicon extraction from comparable corpora. We study of the ambiguity problem revealed by the seed bilingual dictionary used to translate context vectors. For this purpose, we augment the standard approach by a Word Sense Disambiguation process relying on a WordNet-based semantic similarity measure. The aim of this process is to identify the translations that are more likely to give the best representation of words in the target language. On two specialized French-English comparable corpora, empirical experimental results show that the proposed method consistently outperforms the standard approach.

[1]  Reinhard Rapp,et al.  Identifying Word Translations in Non-Parallel Texts , 1995, ACL.

[2]  Kyo Kageura,et al.  Anchor Points for Bilingual Lexicon Extraction from Small Comparable Corpora , 2009, MTSUMMIT.

[3]  Jungpil Shin,et al.  Efficient Image Retrieval Using Conceptualization of Annotated Images , 2007, MCAM.

[4]  Pierre Zweigenbaum,et al.  Looking for Candidate Translational Equivalents in Specialized, Comparable Corpora , 2002, COLING.

[5]  Emmanuel Morin,et al.  Comparabilité de corpus et fouille terminologique multilingue , 2006, Trait. Autom. des Langues.

[6]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[7]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[8]  Éric Gaussier,et al.  Improving Corpus Comparability for Bilingual Lexicon Extraction from Comparable Corpora , 2010, COLING.

[9]  Georges Linarès,et al.  A Multi-view Approach for Term Translation Spotting , 2011, CICLing.

[10]  Pankoo Kim,et al.  A method for enhancing image retrieval based on annotation using modified WUP similarity in WordNet , 2012 .

[11]  Emmanuel Morin,et al.  Adaptive Dictionary for Bilingual Lexicon Extraction from Comparable Corpora , 2012, LREC.

[12]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[13]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[14]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[15]  Lei Shi Adaptive web mining of bilingual lexicons for cross language information retrieval , 2009, CIKM.

[16]  Chang Choi,et al.  Automatic Enrichment of Semantic Relation Network and Its Application to Word Sense Disambiguation , 2011, IEEE Transactions on Knowledge and Data Engineering.

[17]  Emmanuel Morin,et al.  Bilingual Lexicon Extraction from Comparable Corpora Enhanced with Parallel Corpora , 2011, BUCC@ACL.

[18]  Philippe Langlais,et al.  Revisiting Context-based Projection Methods for Term-Translation Spotting in Comparable Corpora , 2010, COLING.

[19]  Jean-Michel Renders,et al.  A Geometric View on Bilingual Lexicon Extraction from Comparable Corpora , 2004, ACL.

[20]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[21]  Pierre Zweigenbaum,et al.  The Effect of a General Lexicon in Corpus-Based Identification of French-English Medical Word Translations , 2003, MIE.

[22]  Pascale Fung,et al.  A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora , 1998, AMTA.

[23]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.