Reward Augmented Maximum Likelihood for Neural Structured Prediction

A key problem in structured output prediction is direct optimization of the task reward function that matters for test evaluation. This paper presents a simple and computationally efficient approach to incorporate task reward into a maximum likelihood framework. By establishing a link between the log-likelihood and expected reward objectives, we show that an optimal regularized expected reward is achieved when the conditional distribution of the outputs given the inputs is proportional to their exponentiated scaled rewards. Accordingly, we present a framework to smooth the predictive probability of the outputs using their corresponding rewards. We optimize the conditional log-probability of augmented outputs that are sampled proportionally to their exponentiated scaled rewards. Experiments on neural sequence to sequence models for speech recognition and machine translation show notable improvements over a maximum likelihood baseline by using reward augmented maximum likelihood (RAML), where the rewards are defined as the negative edit distance between the outputs and the ground truth labels.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[3]  P. Strevens Iii , 1985 .

[4]  Jing Peng,et al.  Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[7]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[8]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[9]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[11]  A. ADoefaa,et al.  ? ? ? ? f ? ? ? ? ? , 2003 .

[12]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[15]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[16]  Emanuel Todorov,et al.  Linearly-solvable Markov decision problems , 2006, NIPS.

[17]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[18]  Marc Toussaint,et al.  Learning model-free robot control by a Monte Carlo EM algorithm , 2009, Auton. Robots.

[19]  Tamir Hazan,et al.  Direct Loss Minimization for Structured Prediction , 2010, NIPS.

[20]  Noah A. Smith,et al.  Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions , 2010, NAACL.

[21]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[22]  Veselin Stoyanov,et al.  Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.

[23]  Hugo Larochelle,et al.  Loss-sensitive Training of Probabilistic Conditional Random Fields , 2011, ArXiv.

[24]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[25]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[26]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[27]  Justin Domke,et al.  Generic Methods for Optimization-Based Modeling , 2012, AISTATS.

[28]  Vicenç Gómez,et al.  Optimal control as a graphical model inference problem , 2009, Machine Learning.

[29]  Patrick M. Pilarski,et al.  Model-Free reinforcement learning with continuous action in practice , 2012, 2012 American Control Conference (ACC).

[30]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[31]  Joelle Pineau,et al.  Learning from Limited Demonstrations , 2013, NIPS.

[32]  Sergey Levine,et al.  Variational Policy Search via Trajectory Optimization , 2013, NIPS.

[33]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[34]  Yoshua Bengio,et al.  End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results , 2014, ArXiv.

[35]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[36]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[37]  Andrew McCallum,et al.  Learning Dynamic Feature Selection for Fast Sequential Prediction , 2015, ACL.

[38]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[39]  Yoshua Bengio,et al.  Task Loss Estimation for Sequence Prediction , 2015, ArXiv.

[40]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[41]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[42]  Rauf Izmailov,et al.  Learning using privileged information: similarity control and knowledge transfer , 2015, J. Mach. Learn. Res..

[43]  Quoc V. Le,et al.  Listen, Attend and Spell , 2015, ArXiv.

[44]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[45]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[46]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[47]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[48]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[49]  Slav Petrov,et al.  Globally Normalized Transition-Based Neural Networks , 2016, ACL.

[50]  Alexander M. Rush,et al.  Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.

[51]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[52]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[53]  Yang Liu,et al.  Minimum Risk Training for Neural Machine Translation , 2015, ACL.

[54]  Bernhard Schölkopf,et al.  Unifying distillation and privileged information , 2015, ICLR.

[55]  Joelle Pineau,et al.  An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[56]  S. Moss Listen , 2017 .