Compositionality and lexical alignment of multi-word terms

The automatic compilation of bilingual lists of terms from specialized comparable corpora using lexical alignment has been successful for single-word terms (SWTs), but remains disappointing for multi-word terms (MWTs). The low frequency and the variability of the syntactic structures of MWTs in the source and the target languages are the main reported problems. This paper defines a general framework dedicated to the lexical alignment of MWTs from comparable corpora that includes a compositional translation process and the standard lexical context analysis. The compositional method which is based on the translation of lexical items being restrictive, we introduce an extended compositional method that bridges the gap between MWTs of different syntactic structures through morphological links. We experimented with the two compositional methods for the French–Japanese alignment task. The results show a significant improvement for the translation of MWTs and advocate further morphological analysis in lexical alignment.

[1]  Takaaki Tanaka Measuring the Similarity between Compound Nouns in Different Languages Using Non-Parallel Corpora , 2002, COLING.

[2]  Eric Brill,et al.  Some Advances in Transformation-Based Part of Speech Tagging , 1994, AAAI.

[3]  Pierre Zweigenbaum,et al.  Looking for Candidate Translational Equivalents in Specialized, Comparable Corpora , 2002, COLING.

[4]  Kyo Kageura,et al.  Bilingual Terminology Mining - Using Brain, not brawn comparable corpora , 2007, ACL.

[5]  Pascale Fung,et al.  A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora , 1998, AMTA.

[6]  Timothy Baldwin,et al.  Noun-Noun Compound Machine Translation A Feasibility Study on Shallow Processing , 2003, Proceedings of the ACL 2003 workshop on Multiword expressions analysis, acquisition and treatment -.

[7]  I. Dan Melamed A Word-to-Word Model of Translational Equivalence , 1997, ACL.

[8]  Eric Gaussier,et al.  Une nouvelle approche à l'extraction de lexiques bilingues à partir de corpus comparables , 2007 .

[9]  Emmanuel Morin,et al.  French-English Terminology Extraction from Comparable Corpora , 2005, IJCNLP.

[10]  Michael E. Lesk,et al.  Computer Evaluation of Indexing and Text Processing , 1968, JACM.

[11]  Timothy Baldwin,et al.  Translation by Machine of Complex Nominals: Getting it Right , 2004 .

[12]  Kyo Kageura,et al.  Construction of Grammar Based Term Extraction Model for Japanese , 2004 .

[13]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[14]  Pascale Fung,et al.  Finding Terminology Translations from Non-parallel Corpora , 1997, VLC.

[15]  Jennifer Pearson,et al.  Working with Specialized Language: A Practical Guide to Using Corpora , 2002 .

[16]  Maria Teresa Pazienza Information Extraction in the Web Era , 2003, Lecture Notes in Computer Science.

[17]  Béatrice Daille,et al.  Conceptual Structuring through Term Variations , 2003, ACL 2003.

[18]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[19]  Fatiha Sadat,et al.  An Approach Based on Multilingual Thesauri and Model Combination for Bilingual Lexicon Extraction , 2002, COLING.

[20]  Emmanuel Morin,et al.  Comparabilité de corpus et fouille terminologique multilingue , 2006, Trait. Autom. des Langues.

[21]  Taijiro Tsutsumi Wide-Range Restructuring of Intermediate Representations in Machine Translation , 1990, Comput. Linguistics.

[22]  Gregory Grefenstette,et al.  Corpus-Derived First, Second and Third-Order Word Affinities , 1994 .

[23]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[24]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.

[25]  Pierre Zweigenbaum,et al.  Looking for French-English translations in comparable medical corpora , 2002, AMIA.

[26]  L. M. Faltz,et al.  Boolean semantics for natural language , 1984 .

[27]  Christian Jacquemin,et al.  Spotting and Discovering Terms through Natural Language Processing , 1997 .

[29]  Satoshi Sato,et al.  Compiling French-Japanese Terminologies from the Web , 2006, EACL.

[30]  Andrei Mikheev,et al.  Automatic Rule Induction for Unknown-Word Guessing , 1997, CL.

[31]  Michel Simard,et al.  Statistical Translation Alignment with Compositionality Constraints , 2003, ParallelTexts@NAACL-HLT.

[32]  Louisa Sadler,et al.  Structural Non-Correspondence in Translation , 1991, EACL.

[33]  Gregory Grefenstette,et al.  The World Wide Web as a Resource for Example-Based Machine Translation Tasks , 1999, TC.

[34]  Béatrice Daille Qualitative terminology extraction: Identifying relational adjectives , 2001 .

[35]  Mitchell Marcus,et al.  Empirical Methods for Exploiting Parallel Texts , 2001 .

[36]  Fiammetta Namer FLEMM : Un analyseur flexionnel du français à base de règles , 2000 .