A Systematic Exploration of Diversity in Machine Translation

This paper addresses the problem of producing a diverse set of plausible translations. We present a simple procedure that can be used with any statistical machine translation (MT) system. We explore three ways of using diverse translations: (1) system combination, (2) discriminative reranking with rich features, and (3) a novel post-editing scenario in which multiple translations are presented to users. We find that diversity can improve performance on these tasks, especially for sentences that are difficult for MT.

[1]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[2]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[3]  Stanley F. Chen,et al.  An empirical study of smoothing techniques for language modeling , 1999 .

[4]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[5]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[6]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[7]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[8]  Stephen J. Wright,et al.  Object-oriented software for quadratic programming , 2003, TOMS.

[9]  Aravind K. Joshi,et al.  An SVM-based voting algorithm with application to parse reranking , 2003, CoNLL.

[10]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[11]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[12]  Ossama Emam,et al.  Language Model Based Arabic Word Segmentation , 2003, ACL.

[13]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[14]  Chin-Yew Lin,et al.  ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[15]  Anoop Sarkar,et al.  Discriminative Reranking for Machine Translation , 2004, NAACL.

[16]  Alexander M. Fraser,et al.  A Smorgasbord of Features for Statistical Machine Translation , 2004, NAACL.

[17]  Shankar Kumar,et al.  Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2004, NAACL.

[18]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[19]  Percy Liang,et al.  Semi-Supervised Learning for Natural Language , 2005 .

[20]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[21]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[22]  Andrew Y. Ng,et al.  Solving the Problem of Cascading Errors: Approximate Bayesian Inference for Linguistic Annotation Pipelines , 2006, EMNLP.

[23]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[24]  Wolfgang Macherey,et al.  An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems , 2007, EMNLP.

[25]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[26]  Richard M. Schwartz,et al.  Combining Outputs from Multiple Machine Translation Systems , 2007, NAACL.

[27]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[28]  Christopher D. Manning,et al.  Optimizing Chinese Word Segmentation for Machine Translation Performance , 2008, WMT@ACL.

[29]  Noah A. Smith,et al.  Wider Pipelines: N-Best Alignments and Parses in MT Training , 2008, AMTA.

[30]  Liang Huang,et al.  Forest Reranking: Discriminative Parsing with Non-Local Features , 2008, ACL.

[31]  Shankar Kumar,et al.  Lattice Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2008, EMNLP.

[32]  Stephan Vogel,et al.  Combination of Machine Translation Systems via Hypothesis Selection from Combined N-Best Lists , 2008, AMTA 2008.

[33]  Philipp Koehn,et al.  A unified framework for phrase-based, hierarchical, and syntax-based statistical machine translation , 2009, IWSLT.

[34]  Chris Dyer,et al.  Using a maximum entropy model to build segmentation lattices for MT , 2009, NAACL.

[35]  Shankar Kumar,et al.  Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices , 2009, ACL/IJCNLP.

[36]  John DeNero,et al.  Fast Consensus Decoding over Translation Forests , 2009, ACL.

[37]  M. Tatsumi Correlation between Automatic Evaluation Metric Scores, Post-Editing Speed, and Some Other Factors , 2009, MTSUMMIT.

[38]  Sanjeev Khudanpur,et al.  Variational Decoding for Statistical Machine Translation , 2009, ACL.

[39]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[40]  Philipp Koehn,et al.  Enabling Monolingual Translators: Post-Editing vs. Options , 2010, NAACL.

[41]  Ben Taskar,et al.  Structured Determinantal Point Processes , 2010, NIPS.

[42]  Alon Lavie,et al.  Voting on N-grams for Machine Translation System Combination , 2010, AMTA.

[43]  Philip Resnik,et al.  A formal model of ambiguity and its applications in machine translation , 2010 .

[44]  Alon Lavie,et al.  Combining Machine Translation Output with Open Source: The Carnegie Mellon Multi-Engine Machine Translation Scheme , 2010, Prague Bull. Math. Linguistics.

[45]  Nicola Cancedda,et al.  Minimum Error Rate Training by Sampling the Translation Lattice , 2010, EMNLP.

[46]  Philipp Koehn,et al.  Findings of the 2011 Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.

[47]  Lucia Specia,et al.  Predicting Machine Translation Adequacy , 2011, MTSUMMIT.

[48]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[49]  Mark Hopkins,et al.  Tuning as Ranking , 2011, EMNLP.

[50]  Lucia Specia,et al.  Exploiting Objective Annotations for Measuring Translation Post-editing Effort , 2011 .

[51]  Yaser Al-Onaizan,et al.  Goodness: A Method for Measuring Machine Translation Confidence , 2011, ACL.

[52]  Ben Taskar,et al.  Learning Determinantal Point Processes , 2011, UAI.

[53]  Gregory Shakhnarovich,et al.  Diverse M-Best Solutions in Markov Random Fields , 2012, ECCV.

[54]  Maarit Koponen,et al.  Comparing human perceptions of post-editing effort with post-editing operations , 2012, WMT@NAACL-HLT.

[55]  Dan Klein,et al.  Large-Scale Syntactic Language Modeling with Treelets , 2012, ACL.

[56]  Spyridon Matsoukas,et al.  Trait-Based Hypothesis Selection For Machine Translation , 2012, HLT-NAACL.

[57]  Philipp Koehn,et al.  Findings of the 2012 Workshop on Statistical Machine Translation , 2012, WMT@NAACL-HLT.

[58]  Ben Taskar,et al.  Discovering Diverse and Salient Threads in Document Collections , 2012, EMNLP.

[59]  Gregory Shakhnarovich,et al.  Discriminative Re-ranking of Diverse Segmentations , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Jingbo Zhu,et al.  Bagging and Boosting statistical machine translation systems , 2013, Artif. Intell..

[61]  Daniel Jurafsky,et al.  Positive Diversity Tuning for Machine Translation System Combination , 2013, WMT@ACL.

[62]  Philipp Koehn,et al.  Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.