(Utilisation de la similarité sémantique pour l’extraction de lexiques bilingues à partir de corpus comparables) [in French]

This paper presents a new method that aims to improve the results of the standard approach used for bilingual lexicon extraction from specialized comparable corpora. We attempt to solve the problem of context vector word polysemy. Instead of using all the entries of the dictionary to translate a context vector, we only use the words of the lexicon that are more likely to give the best characterization of context vectors in the target language. On two specialised French-English comparable corpora, empirical experimental results show that our method improves the results obtained by the standard approach especially when many words are ambiguous. MOTS-CLES : lexique bilingue, corpus comparable specialise, desambiguisation semantique, WordNet.

[1]  Pankoo Kim,et al.  A method for enhancing image retrieval based on annotation using modified WUP similarity in WordNet , 2012 .

[2]  Emmanuel Morin,et al.  Adaptive Dictionary for Bilingual Lexicon Extraction from Comparable Corpora , 2012, LREC.

[3]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[4]  Pascale Fung,et al.  A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora , 1998, AMTA.

[5]  Georges Linarès,et al.  Une approche multi-vue pour l'extraction terminologique bilingue , 2011, CORIA.

[6]  Marie-Francine Moens,et al.  Detecting Highly Confident Word Translations from Comparable Corpora without Any Prior Knowledge , 2012, EACL.

[7]  E. Morin,et al.  Points d'ancrage pour l'extraction lexicale bilingue à partir de petits corpus comparables spécialisés , 2009 .

[8]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[9]  Éric Gaussier,et al.  Improving Corpus Comparability for Bilingual Lexicon Extraction from Comparable Corpora , 2010, COLING.

[10]  Pascale Fung,et al.  A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora , 1995, ACL.

[11]  Philippe Langlais,et al.  Revisiting Context-based Projection Methods for Term-Translation Spotting in Comparable Corpora , 2010, COLING.

[12]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[13]  Jean-Michel Renders,et al.  A Geometric View on Bilingual Lexicon Extraction from Comparable Corpora , 2004, ACL.

[14]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[15]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[16]  Junichi Tsujii,et al.  Bilingual Dictionary Extraction from Wikipedia , 2009, MTSUMMIT.

[17]  Emmanuel Morin,et al.  QAlign: A New Method for Bilingual Lexicon Extraction from Comparable Corpora , 2012, CICLing.

[18]  Pierre Zweigenbaum,et al.  The Effect of a General Lexicon in Corpus-Based Identification of French-English Medical Word Translations , 2003, MIE.

[19]  Chang Choi,et al.  Automatic Enrichment of Semantic Relation Network and Its Application to Word Sense Disambiguation , 2011, IEEE Transactions on Knowledge and Data Engineering.

[20]  Emmanuel Morin,et al.  Bilingual Lexicon Extraction from Comparable Corpora Enhanced with Parallel Corpora , 2011, BUCC@ACL.

[21]  Jungpil Shin,et al.  Efficient Image Retrieval Using Conceptualization of Annotated Images , 2007, MCAM.

[22]  E. Morin,et al.  Extraction de terminologies bilingues à partir de corpus comparables , 2004, JEPTALNRECITAL.

[23]  Reinhard Rapp,et al.  Identifying Word Translations in Non-Parallel Texts , 1995, ACL.

[24]  Pablo Gamallo Otero Learning bilingual lexicons from comparable English and Spanish corpora , 2007, MTSUMMIT.

[25]  Kyo Kageura,et al.  Anchor Points for Bilingual Lexicon Extraction from Small Comparable Corpora , 2009, MTSUMMIT.

[26]  Fatiha Sadat,et al.  An Approach Based on Multilingual Thesauri and Model Combination for Bilingual Lexicon Extraction , 2002, COLING.

[27]  Emmanuel Morin,et al.  Comparabilité de corpus et fouille terminologique multilingue , 2006, Trait. Autom. des Langues.