Listwise Ranking Functions for Statistical Machine Translation

Decision rules play an important role in the tuning and decoding steps of statistical machine translation. The traditional decision rule selects the candidate with the greatest potential from a candidate space by examining each candidate individually. However, viewing each candidate as independent imposes a serious limitation on the translation task. We instead view the problem from a ranking perspective that naturally allows the consideration of an entire list of candidates as a whole through the adoption of a listwise ranking function. Our shift from a pointwise to a listwise perspective proves to be a simple yet powerful extension to current modeling that allows arbitrary pairwise functions to be incorporated as features, whose weights can be estimated jointly with traditional ones. We further demonstrate that our formulation encompasses the minimum Bayes risk (MBR) approach, another decision rule that considers restricted listwise information, as a special case. Experiments show that our approach consistently outperforms the baseline and MBR methods across the considered test sets.

[1]  Maksims Volkovs,et al.  BoltzRank: learning to maximize expected ranking gain , 2009, ICML '09.

[2]  John DeNero,et al.  Fast Consensus Decoding over Translation Forests , 2009, ACL.

[3]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[4]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[5]  Hermann Ney,et al.  Word-Level Confidence Estimation for Machine Translation , 2007, CL.

[6]  David A. Smith,et al.  Minimum Risk Annealing for Training Log-Linear Models , 2006, ACL.

[7]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[8]  Hongyuan Zha,et al.  Global ranking by exploiting user clicks , 2009, SIGIR.

[9]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[10]  John DeNero,et al.  Consensus Training for Consensus Decoding in Machine Translation , 2009, EMNLP.

[11]  Omar Zaidan,et al.  Z-MERT: A Fully Configurable Open Source Tool for Minimum Error Rate Training of Machine Translation Systems , 2009, Prague Bull. Math. Linguistics.

[12]  Mark Hopkins,et al.  Tuning as Ranking , 2011, EMNLP.

[13]  Hang Li Learning to Rank for Information Retrieval and Natural Language Processing , 2011, Synthesis Lectures on Human Language Technologies.

[14]  Tao Qin,et al.  Global Ranking Using Continuous Conditional Random Fields , 2008, NIPS.

[15]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[16]  Pradeep Ravikumar,et al.  On NDCG Consistency of Listwise Ranking Methods , 2011, AISTATS.

[17]  Pradeep Ravikumar,et al.  A Representation Theory for Ranking Functions , 2014, NIPS.

[18]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[19]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[20]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[21]  Tao Qin,et al.  Learning to rank relational objects and its application to web search , 2008, WWW.

[22]  Philip Resnik,et al.  Online Large-Margin Training of Syntactic and Structural Translation Features , 2008, EMNLP.

[23]  Li Deng,et al.  Maximum Expected BLEU Training of Phrase and Lexicon Translation Models , 2012, ACL.

[24]  Alexander M. Fraser,et al.  A Smorgasbord of Features for Statistical Machine Translation , 2004, NAACL.

[25]  Koby Crammer,et al.  Ultraconservative Online Algorithms for Multiclass Problems , 2001, J. Mach. Learn. Res..

[26]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[27]  Zhaohui Zheng,et al.  Learning to re-rank web search results with multiple pairwise features , 2011, WSDM '11.

[28]  Shankar Kumar,et al.  Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2004, NAACL.

[29]  Shankar Kumar,et al.  Lattice Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2008, EMNLP.

[30]  Kevin Duh,et al.  Learning to rank with partially-labeled data , 2008, SIGIR '08.

[31]  Shankar Kumar,et al.  Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices , 2009, ACL/IJCNLP.

[32]  Anoop Sarkar,et al.  Discriminative Reranking for Machine Translation , 2004, NAACL.

[33]  Alon Lavie,et al.  Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[34]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.