Grouping Multi-word Expressions According to Part-Of-Speech in Statistical Machine Translation

This paper studies a strategy for identifying and using multi-word expressions in Statistical Machine Translation. The performance of the proposed strategy for various types of multi-word expressions (like nouns or verbs) is evaluated in terms of alignment quality as well as translation accuracy. Evaluations are performed by using real-life data, namely the European Parliament corpus. Results from translation tasks from English-to-Spanish and from Spanish-to-English are presented and discussed.

[1]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[2]  José B. Mariño,et al.  An n-gram-based statistical machine translation decoder , 2005, INTERSPEECH.

[3]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[4]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[5]  Hermann Ney,et al.  Phrase-Based Statistical Machine Translation , 2002, KI.

[6]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[7]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[8]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[9]  Timothy Baldwin,et al.  Extracting the Unextractable: A Case Study on Verb-particles , 2002, CoNLL.

[10]  Rafael E. Banchs,et al.  Data Inferred Multi-word Expressions for Statistical Machine Translation , 2005 .

[11]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[12]  José B. Mariño,et al.  Using x-grams for speech-to-speech translation , 2002, INTERSPEECH.

[13]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[14]  Xavier Carreras,et al.  FreeLing: An Open-Source Suite of Language Analyzers , 2004, LREC.

[15]  José B. Mariño,et al.  TALP: Xgram-based spoken language translation system , 2004, IWSLT.

[16]  Patrik Lambert,et al.  Alignment of Parallel Corpora Exploiting Asymmetrically Aligned Phrases , 2006 .

[17]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[18]  Victoria Arranz Development of Language Resources for Speech-to-speech Translation , 2007 .

[19]  José B. Mariño,et al.  Bilingual N-gram Statistical Machine Translation , 2005 .