An Empirical Comparison of Features and Tuning for Phrase-based Machine Translation

Scalable discriminative training methods are now broadly available for estimating phrase-based, feature-rich translation models. However, the sparse feature sets typically appearing in research evaluations are less attractive than standard dense features such as language and translation model probabilities: they often overfit, do not generalize, or require complex and slow feature extractors. This paper introduces extended features, which are more specific than dense features yet more general than lexicalized sparse features. Large-scale experiments show that extended features yield robust BLEU gains for both Arabic-English (+1.05) and Chinese-English (+0.67) relative to a strong feature-rich baseline. We also specialize the feature set to specific datadomains, identifyanobjectivefunction that is less prone to overfitting, and release fast, scalable, and language-independent tools for implementing the features.

[1]  Mo Yu,et al.  Locally Training the Log-Linear Model for SMT , 2012, EMNLP.

[2]  David Chiang,et al.  Two Easy Improvements to Lexical Weighting , 2011, ACL.

[3]  Alon Lavie,et al.  The CMU Machine Translation Systems at WMT 2013: Syntax, Synthetic Translation Options, and Pseudo-References , 2013, WMT@ACL.

[4]  Noah A. Smith,et al.  Structured Ramp Loss Minimization for Machine Translation , 2012, HLT-NAACL.

[5]  Gerard Salton The use of punctuation patterns in machine translation , 1958, Mech. Transl. Comput. Linguistics.

[6]  Alon Lavie,et al.  One System, Many Domains: Open-Domain Statistical Machine Translation via Feature Augmentation , 2012, AMTA.

[7]  Haitao Mi,et al.  Max-Violation Perceptron and Forced Decoding for Scalable MT Training , 2013, EMNLP.

[8]  Mark Hopkins,et al.  Tuning as Ranking , 2011, EMNLP.

[9]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[10]  Michal Rosen-Zvi,et al.  Hidden Topic Markov Models , 2007, AISTATS.

[11]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[12]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[13]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[14]  Christopher D. Manning,et al.  Feature-Rich Phrase-based Translation: Stanford University’s Submission to the WMT 2013 Translation Task , 2013, WMT@ACL.

[15]  Hermann Ney,et al.  Improving Statistical Machine Translation with Word Class Models , 2013, EMNLP.

[16]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[17]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[18]  Christopher D. Manning,et al.  Optimizing Chinese Word Segmentation for Machine Translation Performance , 2008, WMT@ACL.

[19]  Philip C. Woodland,et al.  Efficient class-based language modelling for very large vocabularies , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[20]  Percy Liang,et al.  Semi-Supervised Learning for Natural Language , 2005 .

[21]  Philipp Koehn,et al.  Sparse lexicalised features and topic adaptation for SMT , 2012, IWSLT.

[22]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[23]  Christopher D. Manning,et al.  Word Segmentation of Informal Arabic with Domain Adaptation , 2014, ACL.

[24]  Peng Xu,et al.  Improved Domain Adaptation for Statistical Machine Translation , 2012, AMTA.

[25]  Christopher D. Manning,et al.  Hierarchical Bayesian Domain Adaptation , 2009, NAACL.

[26]  Philipp Koehn,et al.  Scalable Modified Kneser-Ney Language Model Estimation , 2013, ACL.

[27]  Daniel Marcu,et al.  HyTER: Meaning-Equivalent Semantics for Translation Evaluation , 2012, NAACL.

[28]  Dragos Stefan Munteanu,et al.  Measuring Machine Translation Errors in New Domains , 2013, TACL.

[29]  Qun Liu,et al.  Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information , 2012, ACL.

[30]  Nadir Durrani,et al.  Edinburgh’s Machine Translation Systems for European Language Pairs , 2013, WMT@ACL.

[31]  Deniz Yuret,et al.  Instance Selection for Machine Translation using Feature Decay Algorithms , 2011, WMT@EMNLP.

[32]  Preslav Nakov,et al.  A Tale about PRO and Monsters , 2013, ACL.

[33]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[34]  Preslav Nakov,et al.  Optimizing for Sentence-Level BLEU+1 Yields Short Translations , 2012, COLING.

[35]  Colin Cherry Improved Reordering for Phrase-Based Translation using Sparse Features , 2013, HLT-NAACL.

[36]  George F. Foster,et al.  Batch Tuning Strategies for Statistical Machine Translation , 2012, NAACL.

[37]  Seth Kulick,et al.  Enhancing the Arabic Treebank: a Collaborative Effort toward New Annotation Guidelines , 2008, LREC.

[38]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[39]  Christopher D. Manning,et al.  Fast and Adaptive Online Training of Feature-Rich Translation Models , 2013, ACL.

[40]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[41]  Jianfeng Gao,et al.  Training MRF-Based Phrase Translation Models using Gradient Ascent , 2013, NAACL.

[42]  Stefan Riezler,et al.  On Some Pitfalls in Automatic Evaluation and Significance Testing for MT , 2005, IEEvaluation@ACL.

[43]  Ondrej Bojar,et al.  Scratching the Surface of Possible Translations , 2013, TSD.

[44]  Chin-Yew Lin,et al.  ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[45]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[46]  Thorsten Brants,et al.  Distributed Word Clustering for Large Scale Class-Based Language Modeling in Machine Translation , 2008, ACL.

[47]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[48]  Fei Xia,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[49]  M. Rey,et al.  11 , 001 New Features for Statistical Machine Translation , 2009 .

[50]  Ben Taskar,et al.  Alignment by Agreement , 2006, NAACL.

[51]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[52]  Taro Watanabe,et al.  Online Large-Margin Training for Statistical Machine Translation , 2007, EMNLP.

[53]  Mark Steedman,et al.  Two Decades of Unsupervised POS Induction: How Far Have We Come? , 2010, EMNLP.

[54]  Christopher D. Manning,et al.  Phrasal: A Toolkit for New Directions in Statistical Machine Translation , 2014, WMT@ACL.