An Effective Compositional Model for Lexical Alignment

The automatic compilation of bilingual dictionaries from comparable corpora has been successful for single-word terms (SWTs), but remains disappointed for multi-word terms (MWTs). The increase of coverage of bilingual dictionary thanks to compositional translation improved the results, but still shows some limits for MWTs of different syntactic structures. In this paper, we propose to bridge the gap between syntactic structures through morphological links. The results show a significant improvement in the compositional translation of MWTs that demonstrate the efficiency of the morphologically based-method for lexical alignment.

[1]  Andrei Mikheev,et al.  Automatic Rule Induction for Unknown-Word Guessing , 1997, CL.

[2]  Judith N. Levi,et al.  The syntax and semantics of complex nominals , 1978 .

[3]  Pascale Fung,et al.  A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora , 1998, AMTA.

[4]  Carol Peters,et al.  Cross-Language Information Retrieval: A System for Comparable Corpus Querying , 1998 .

[5]  Pierre Zweigenbaum,et al.  Looking for Candidate Translational Equivalents in Specialized, Comparable Corpora , 2002, COLING.

[6]  I. Dan Melamed A Word-to-Word Model of Translational Equivalence , 1997, ACL.

[7]  Pascale Fung,et al.  Finding Terminology Translations from Non-parallel Corpora , 1997, VLC.

[8]  Béatrice Daille,et al.  Qualitative terminology extraction , 2001 .

[9]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[10]  Eric Gaussier,et al.  Une nouvelle approche à l'extraction de lexiques bilingues à partir de corpus comparables , 2007 .

[11]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.

[12]  Michael E. Lesk,et al.  Computer Evaluation of Indexing and Text Processing , 1968, JACM.

[13]  Kyo Kageura,et al.  Construction of Grammar Based Term Extraction Model for Japanese , 2004 .

[14]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[15]  Timothy Baldwin,et al.  Translation by Machine of Complex Nominals: Getting it Right , 2004 .

[16]  G. A. Barnard,et al.  Transmission of Information: A Statistical Theory of Communications. , 1961 .

[17]  Kyo Kageura,et al.  Bilingual Terminology Mining - Using Brain, not brawn comparable corpora , 2007, ACL.

[18]  Satoshi Sato,et al.  Compiling French-Japanese Terminologies from the Web , 2006, EACL.

[19]  Fatiha Sadat,et al.  An Approach Based on Multilingual Thesauri and Model Combination for Bilingual Lexicon Extraction , 2002, COLING.

[20]  Emmanuel Morin,et al.  Comparabilité de corpus et fouille terminologique multilingue , 2006, Trait. Autom. des Langues.

[21]  Béatrice Daille Terminology Mining , 2002, SCIE.

[22]  Gregory Grefenstette,et al.  The World Wide Web as a Resource for Example-Based Machine Translation Tasks , 1999, TC.

[23]  Jennifer Pearson,et al.  Working with Specialized Language: A Practical Guide to Using Corpora , 2002 .

[24]  Maria Teresa Pazienza Information Extraction in the Web Era , 2003, Lecture Notes in Computer Science.

[25]  Reinhard Rapp,et al.  Identifying Word Translations in Non-Parallel Texts , 1995, ACL.