Bandit structured prediction for learning from partial feedback in statistical machine translation

We present an approach to structured prediction from bandit feedback, called Bandit Structured Prediction, where only the value of a task loss function at a single predicted point, instead of a correct structure, is observed in learning. We present an application to discriminative reranking in Statistical Machine Translation (SMT) where the learning algorithm only has access to a 1-BLEU loss evaluation of a predicted translation instead of obtaining a gold standard reference translation. In our experiment bandit feedback is obtained by evaluating BLEU on reference translations without revealing them to the algorithm. This can be thought of as a simulation of interactive machine translation where an SMT system is personalized by a user who provides single point feedback to predicted translations. Our experiments show that our approach improves translation quality and is comparable to approaches that employ more informative feedback in learning.

[1]  Alan L. Yuille,et al.  Probabilistic models of vision and max-margin methods , 2012 .

[2]  Shay B. Cohen,et al.  A Coactive Learning View of Online Structured Prediction in Statistical Machine Translation , 2015, CoNLL.

[3]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[4]  James C. Spall,et al.  Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .

[5]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[6]  David A. Smith,et al.  Minimum Risk Annealing for Training Log-Linear Models , 2006, ACL.

[7]  George Papandreou,et al.  Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models , 2011, 2011 International Conference on Computer Vision.

[8]  Lin Xiao,et al.  Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[9]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[10]  Li Deng,et al.  Maximum Expected BLEU Training of Phrase and Lexicon Translation Models , 2012, ACL.

[11]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[12]  Stefan Riezler,et al.  On Some Pitfalls in Automatic Evaluation and Significance Testing for MT , 2005, IEEvaluation@ACL.

[13]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[14]  Noah A. Smith,et al.  Softmax-Margin Training for Structured Log-Linear Models , 2010 .

[15]  Alon Lavie,et al.  Learning from Post-Editing: Online Model Adaptation for Statistical Machine Translation , 2014, EACL.

[16]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[17]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[18]  Hermann Ney,et al.  A Comparison of Update Strategies for Large-Scale Maximum Expected BLEU Training , 2015, NAACL.

[19]  Eric Moulines,et al.  Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.

[20]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[21]  Ben Taskar,et al.  Max-Margin Parsing , 2004, EMNLP.

[22]  Ronald de Sousa Learning to be Natural , 2000 .

[23]  A. Waibel,et al.  A real-world system for simultaneous translation of German lectures , 2013, INTERSPEECH.

[24]  Stefan Riezler,et al.  Response-based Learning for Grounded Machine Translation , 2014, ACL.

[25]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[26]  Ying Zhang,et al.  Online discriminative learning for machine translation with binary-valued feedback , 2014, Machine Translation.

[27]  ChapelleOlivier,et al.  Simple and Scalable Response Prediction for Display Advertising , 2014 .

[28]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[29]  Léon Bottou,et al.  Stochastic Learning , 2003, Advanced Lectures on Machine Learning.

[30]  Jeffrey Heer,et al.  Human Effort and Machine Learnability in Computer Aided Translation , 2014, EMNLP.

[31]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[32]  Tim Hesterberg,et al.  Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.

[33]  Hal Daumé,et al.  Domain Adaptation for Machine Translation by Mining Unseen Words , 2011, ACL.

[34]  Gunnar Rätsch,et al.  Advanced Lectures on Machine Learning , 2004, Lecture Notes in Computer Science.

[35]  Jianfeng Gao,et al.  Large-scale Expected BLEU Training of Phrase-based Reordering Models , 2014, EMNLP.

[36]  Philipp Koehn,et al.  Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[37]  Philipp Koehn,et al.  Scalable Modified Kneser-Ney Language Model Estimation , 2013, ACL.

[38]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[39]  R. Likert “Technique for the Measurement of Attitudes, A” , 2022, The SAGE Encyclopedia of Research Design.

[40]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[41]  John Langford,et al.  Learning to Search Better than Your Teacher , 2015, ICML.

[42]  Mauro Cettolo,et al.  Online adaptation to post-edits for phrase-based statistical machine translation , 2014, Machine Translation.

[43]  Alon Lavie,et al.  Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[44]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[45]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[46]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[47]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[48]  Luke S. Zettlemoyer,et al.  Reinforcement Learning for Mapping Instructions to Actions , 2009, ACL.

[49]  Dan Roth,et al.  Learning from natural instructions , 2011, Machine Learning.

[50]  J. Abernethy,et al.  An Efficient Bandit Algorithm for √ T-Regret in Online Multiclass Prediction ? , 2009 .

[51]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[52]  Jacob D. Abernethy,et al.  An Efficient Bandit Algorithm for sqrt(T) Regret in Online Multiclass Prediction? , 2009, COLT.

[53]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[54]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.