Improved Statistical Machine Translation Using MultiWord Expressions

Identifying and translating a MultiWord Expression (MWE) in a text represents an issue for numerous applications in Natural Language Processing (NLP) as MWEs appear in all text genres and pose significant problems for every kind of NLP tasks. In this paper, we describe a hybrid approach for extracting contiguous MWEs and their translations in a FrenchEnglish parallel corpus. We evaluate both the alignment and the translation quality. Next, we implement a method that integrates these units to Moses, the state of the art Machine Translation (MT) system. Conducted experiments show that MWEs improve translation performance.

[1]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[2]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[3]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[4]  Julian Kupiec,et al.  An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora , 1993, ACL.

[5]  Kenneth Ward Church,et al.  Termight: Identifying and Translating Technical Terminology , 1994, ANLP.

[6]  Vasileios Hatzivassiloglou,et al.  Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.

[7]  Ray Jackendoff,et al.  The Architecture of the Language Faculty , 1996 .

[8]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[9]  B. Daille Extraction de collocations à partir de textes , 2001, JEPTALNRECITAL.

[10]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[11]  Ted Pedersen,et al.  An Evaluation Exercise for Word Alignment , 2003, ParallelTexts@NAACL-HLT.

[12]  Timothy Baldwin,et al.  Noun-Noun Compound Machine Translation A Feasibility Study on Shallow Processing , 2003, Proceedings of the ACL 2003 workshop on Multiword expressions analysis, acquisition and treatment -.

[13]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[14]  Jason S. Chang,et al.  Bilingual Collocation Extraction Based on Syntactic and Statistical Analyses , 2003, ROCLING/IJCLCLP.

[15]  Olga Vechtomova,et al.  The Role of Multi-word Units in Interactive Information Retrieval , 2005, ECIR.

[16]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[17]  Rafael E. Banchs,et al.  Data Inferred Multi-word Expressions for Statistical Machine Translation , 2005 .

[18]  Rafael E. Banchs,et al.  Grouping Multi-word Expressions According to Part-Of-Speech in Statistical Machine Translation , 2006, Workshop On Multi-Word-Expressions In A Multilingual Context.

[19]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[20]  Eric Wehrli,et al.  Collocation translation based on sentence alignment and parsing , 2007, JEPTALNRECITAL.

[21]  D. Tufi,et al.  PARALLEL CORPORA , ALIGNMENT TECHNOLOGIES AND FURTHER PROSPECTS IN MULTILINGUAL RESOURCES AND TECHNOLOGY INFRASTRUCTURE , 2008 .

[22]  Qun Liu,et al.  Improving Statistical Machine Translation Using Domain Bilingual Multiword Expressions , 2009, MWE@IJCNLP.

[23]  Andy Way,et al.  Multi-Word Expression-Sensitive Word Alignment , 2010 .

[24]  Romaric Besançon,et al.  LIMA : A Multilingual Framework for Linguistic Analysis and Linguistic Resources Development and Evaluation , 2010, LREC.

[25]  Mark A. Finlayson,et al.  Detecting Multi-Word Expressions Improves Word Sense Disambiguation , 2011, MWE@ACL.