Text Generation by Learning from Demonstrations

Current approaches to text generation largely rely on autoregressive models and maximum likelihood estimation. This paradigm leads to (i) diverse but low-quality samples due to mismatched learning objective and evaluation metric (likelihood vs. quality) and (ii) exposure bias due to mismatched history distributions (gold vs. model-generated). To alleviate these problems, we frame text generation as a reinforcement learning (RL) problem with expert demonstrations (i.e., the training data), where the goal is to maximize quality given model-generated histories. Prior RL approaches to generation often face optimization issues due to the large action space and sparse reward. We propose GOLD (generation by off-policy learning from demonstrations): an algorithm that learns from the off-policy demonstrations by importance weighting and does not suffer from degenerative solutions. We find that GOLD outperforms the baselines according to automatic and human evaluation on summarization, question generation, and machine translation, including attaining state-of-the-art results for CNN/DailyMail summarization. Further, we show that models trained by GOLD are less sensitive to decoding algorithms and the generation quality does not degrade much as the length increases.

[1]  Hannaneh Hajishirzi,et al.  Mixture Content Selection for Diverse Sequence Generation , 2019, EMNLP.

[2]  Jiusheng Chen,et al.  ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training , 2020, EMNLP.

[3]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[4]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[5]  Andrew McCallum,et al.  Structured Prediction Energy Networks , 2015, ICML.

[6]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[7]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[8]  Shashi Narayan,et al.  Sticking to the Facts: Confident Decoding for Faithful Data-to-Text Generation , 2019, ArXiv.

[9]  Kyunghyun Cho,et al.  Consistency of a Recurrent Language Model With Respect to Incomplete Decoding , 2020, EMNLP.

[10]  Ming Zhou,et al.  Neural Question Generation from Text: A Preliminary Study , 2017, NLPCC.

[11]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[12]  Yoav Goldberg,et al.  Split and Rephrase: Better Evaluation and a Stronger Baseline , 2018, ACL.

[13]  Kathleen McKeown,et al.  A Good Sample is Hard to Find: Noise Injection Sampling and Self-Training for Neural Language Generation Models , 2019, INLG.

[14]  Jonathan Berant,et al.  Evaluating Text GANs as Language Models , 2018, NAACL.

[15]  Lijun Wu,et al.  A Study of Reinforcement Learning for Neural Machine Translation , 2018, EMNLP.

[16]  Omri Abend,et al.  On the Weaknesses of Reinforcement Learning for Neural Machine Translation , 2019, ICLR.

[17]  Naren Ramakrishnan,et al.  Deep Reinforcement Learning for Sequence-to-Sequence Models , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Ferenc Huszar,et al.  How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary? , 2015, ArXiv.

[19]  Jason Weston,et al.  Neural Text Generation with Unlikelihood Training , 2019, ICLR.

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[22]  Alan Ritter,et al.  Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.

[23]  Joelle Pineau,et al.  Language GANs Falling Short , 2018, ICLR.

[24]  Graham Neubig,et al.  A Continuous Relaxation of Beam Search for End-to-end Training of Neural Sequence Models , 2017, AAAI.

[25]  Lantao Yu,et al.  CoT: Cooperative Training for Generative Modeling of Discrete Data , 2018, ICML.

[26]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[27]  Lifu Tu,et al.  Improving Joint Training of Inference Networks and Structured Prediction Energy Networks , 2019, SPNLP.

[28]  Verena Rieser,et al.  Why We Need New Evaluation Metrics for NLG , 2017, EMNLP.

[29]  Hang Li,et al.  Paraphrase Generation with Deep Reinforcement Learning , 2017, EMNLP.

[30]  Alexander M. Rush,et al.  Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.

[31]  Tatsunori B. Hashimoto,et al.  Improved Natural Language Generation via Loss Truncation , 2020, ACL.

[32]  Marcello Federico,et al.  Report on the 10th IWSLT evaluation campaign , 2013, IWSLT.

[33]  Noah A. Smith,et al.  Sentence Mover’s Similarity: Automatic Evaluation for Multi-Sentence Texts , 2019, ACL.

[34]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[35]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[36]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[37]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[38]  Rico Sennrich,et al.  On Exposure Bias, Hallucination and Domain Shift in Neural Machine Translation , 2020, ACL.

[39]  Slav Petrov,et al.  Globally Normalized Transition-Based Neural Networks , 2016, ACL.

[40]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[41]  David Chiang,et al.  Correcting Length Bias in Neural Machine Translation , 2018, WMT.

[42]  Mirella Lapata,et al.  Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.

[43]  Percy Liang,et al.  Unifying Human and Statistical Evaluation for Natural Language Generation , 2019, NAACL.

[44]  Bill Byrne,et al.  On NMT Search Errors and Model Errors: Cat Got Your Tongue? , 2019, EMNLP.

[45]  Ramakanth Pasunuru,et al.  Multi-Reward Reinforced Summarization with Saliency and Entailment , 2018, NAACL.

[46]  Yao Zhao,et al.  PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2020, ICML.

[47]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[48]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[49]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[50]  Sergey Levine,et al.  Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[51]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[52]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[53]  Iryna Gurevych,et al.  Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , 2018, ACL 2018.

[54]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[55]  Joelle Pineau,et al.  A Deep Reinforcement Learning Chatbot , 2017, ArXiv.

[56]  Doina Precup,et al.  Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.

[57]  Yang Liu,et al.  Minimum Risk Training for Neural Machine Translation , 2015, ACL.

[58]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[59]  Anton Osokin,et al.  SEARNN: Training RNNs with Global-Local Losses , 2017, ICLR.

[60]  Natasha Jaques,et al.  Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.