The CMU Machine Translation Systems at WMT 2014

We describe the CMU systems submitted to the 2014 WMT shared translation task. We participated in two language pairs, German–English and Hindi–English. Our innovations include: a label coarsening scheme for syntactic tree-to-tree translation, a host of new discriminative features, several modules to create “synthetic translation options” that can generalize beyond what is directly observed in the training data, and a method of combining the output of multiple word aligners to uncover extra phrase pairs and grammar rules.

[1]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[2]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[3]  Vladimir Eidelman,et al.  cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[4]  Alon Lavie,et al.  Automatic Category Label Coarsening for Syntax-Based Machine Translation , 2011, SSST@ACL.

[5]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[6]  Chris Dyer,et al.  Using a maximum entropy model to build segmentation lattices for MT , 2009, NAACL.

[7]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[8]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Nadir Durrani,et al.  Edinburgh’s Machine Translation Systems for European Language Pairs , 2013, WMT@ACL.

[10]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[11]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[12]  Vladimir Eidelman,et al.  Optimization Strategies for Online Large-Margin Learning in Machine Translation , 2012, WMT@NAACL-HLT.

[13]  Alon Lavie,et al.  Improving Syntax-Augmented Machine Translation by Coarsening the Label Set , 2013, NAACL.

[14]  Kristina Toutanova,et al.  Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data , 2014, ACL.

[15]  Noah A. Smith,et al.  Translating into Morphologically Rich Languages with Synthetic Phrases , 2013, EMNLP.

[16]  Alon Lavie,et al.  A General-Purpose Rule Extractor for SCFG-Based Machine Translation , 2011, SSST@ACL.

[17]  Alon Lavie,et al.  Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[18]  Yulia Tsvetkov,et al.  Generating English Determiners in Phrase-Based Translation with Synthetic Translation Options , 2013, WMT@ACL.

[19]  Alon Lavie,et al.  The CMU-Avenue French-English Translation System , 2012, WMT@NAACL-HLT.

[20]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[21]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[22]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[23]  Adrià de Gispert,et al.  The University of Cambridge Russian-English System at WMT13 , 2013, WMT@ACL.

[24]  Alon Lavie,et al.  Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems , 2011, WMT@EMNLP.

[25]  Noah A. Smith,et al.  Transliteration by Sequence Labeling with Lattice Encodings and Reranking , 2012, NEWS@ACL.