Gappy Translation Units under Left-to-Right SMT Decoding

This paper presents an extension for a bilingual n-gram statistical machine translation (SMT) system based on allowing translation units with gaps. Our gappy translation units can be seen as a first step towards introducing hierarchical units similar to those employed in hierarchical MT systems. Our goal is double. On the one hand we aim at capturing the benefits of the higher generalization power shown by hierarchical systems. On the other hand, we want to avoid the computational burden of decoding based on parsing techniques, which among other drawbacks, make dicult the introduction of the required target language model costs. Our experiments show slight but consistent improvements for Chinese-toEnglish machine translation. Accuracy results are competitive with those achieved by a state-of-the-art phrasebased system.

[1]  J. Mariño,et al.  Syntax-enhanced n-gram-based SMT , 2007, MTSUMMIT.

[2]  Nizar Habash Syntactic preprocessing for statistical machine translation , 2007, MTSUMMIT.

[3]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[4]  Stephan Vogel,et al.  An Efficient Two-Pass Approach to Synchronous-CFG Driven Statistical MT , 2007, NAACL.

[5]  José B. Mariño,et al.  Extending MARIE: an N-gram-based SMT decoder , 2007, ACL.

[6]  José B. Mariño,et al.  N-gram-based Machine Translation , 2006, CL.

[7]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[8]  I. Dan Melamed,et al.  Statistical Machine Translation by Parsing , 2004, ACL.

[9]  Hermann Ney,et al.  Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[10]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[11]  Franz Josef Och,et al.  A Systematic Comparison of Phrase-Based, Hierarchical and Syntax-Augmented Statistical MT , 2008, COLING.

[12]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[13]  Marc Dymetman,et al.  Translating with Non-contiguous Phrases , 2005, HLT.

[14]  Kevin Knight,et al.  Capturing practical natural language transformations , 2007, Machine Translation.

[15]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[16]  Hermann Ney,et al.  Phrase-Based Statistical Machine Translation , 2002, KI.

[17]  Eiichiro Sumita,et al.  Toward a Broad-coverage Bilingual Corpus for Speech Translation of Travel Conversations in the Real World , 2002, LREC.

[18]  Taro Watanabe,et al.  Left-to-Right Target Generation for Hierarchical Phrase-Based Translation , 2006, ACL.

[19]  Qun Liu,et al.  HHMM-based Chinese Lexical Analyzer ICTCLAS , 2003, SIGHAN.