论文信息 - Deep Reinforcement Learning for Sequence-to-Sequence Models

Deep Reinforcement Learning for Sequence-to-Sequence Models

In recent times, sequence-to-sequence (seq2seq) models have gained a lot of popularity and provide state-of-the-art performance in a wide variety of tasks, such as machine translation, headline generation, text summarization, speech-to-text conversion, and image caption generation. The underlying framework for all these models is usually a deep neural network comprising an encoder and a decoder. Although simple encoder–decoder models produce competitive results, many researchers have proposed additional improvements over these seq2seq models, e.g., using an attention-based model over the input, pointer-generation models, and self-attention models. However, such seq2seq models suffer from two common problems: 1) exposure bias and 2) inconsistency between train/test measurement. Recently, a completely novel point of view has emerged in addressing these two problems in seq2seq models, leveraging methods from reinforcement learning (RL). In this survey, we consider seq2seq problems from the RL point of view and provide a formulation combining the power of RL methods in decision-making with seq2seq models that enable remembering long-term memories. We present some of the most recent frameworks that combine the concepts from RL and deep neural networks. Our work aims to provide insights into some of the problems that inherently arise with current approaches and how we can address them with better RL models. We also provide the source code for implementing most of the RL models discussed in this paper to support the complex task of abstractive text summarization and provide some targeted experiments for these RL models, both in terms of performance and training time.

[1] Ming Zhou,et al. Selective Encoding for Abstractive Sentence Summarization , 2017, ACL.

[2] Mirella Lapata,et al. Ranking Sentences for Extractive Summarization with Reinforcement Learning , 2018, NAACL.

[3] Xiang Zhang,et al. Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems , 2015, ICLR.

[4] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[5] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[7] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[9] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[12] Zhi Chen,et al. Adversarial Feature Matching for Text Generation , 2017, ICML.

[13] William B. Dolan,et al. Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.

[14] Xuanjing Huang,et al. Attention-Based Convolutional Neural Network for Semantic Relation Extraction , 2016, COLING.

[15] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.

[16] Ming Zhu,et al. A Hierarchical Attention Retrieval Model for Healthcare Question Answering , 2019, WWW.

[17] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.

[19] Kevin Lin,et al. Adversarial Ranking for Language Generation , 2017, NIPS.

[20] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[21] Stefan Riezler,et al. Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning , 2018, ACL.

[22] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[23] Yejin Choi,et al. Deep Communicating Agents for Abstractive Summarization , 2018, NAACL.

[24] Jianfeng Gao,et al. A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[25] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[26] Ilya Sutskever,et al. Learning to Generate Reviews and Discovering Sentiment , 2017, ArXiv.

[27] Dawn Xiaodong Song,et al. SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning , 2017, ArXiv.

[28] Lantao Yu,et al. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[29] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.

[30] Li Wang,et al. A Reinforced Topic-Aware Convolutional Sequence-to-Sequence Model for Abstractive Text Summarization , 2018, IJCAI.

[31] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Kuzman Ganchev,et al. Semantic Role Labeling with Neural Network Factors , 2015, EMNLP.

[33] Daniel Jurafsky,et al. Distant supervision for relation extraction without labeled data , 2009, ACL.

[34] Jianfeng Gao,et al. A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.

[35] Hang Li,et al. Paraphrase Generation with Deep Reinforcement Learning , 2017, EMNLP.

[36] Frank K. Soong,et al. TTS synthesis with bidirectional LSTM based recurrent neural networks , 2014, INTERSPEECH.

[37] Chen Chen,et al. Chinese Zero Pronoun Resolution with Deep Neural Networks , 2016, ACL.

[38] Daniel Jurafsky,et al. Learning to Decode for Future Success , 2017, ArXiv.

[39] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[40] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.

[41] Wojciech Zaremba,et al. Reinforcement Learning Neural Turing Machines - Revised , 2015 .

[42] Yoshua Bengio,et al. Maximum-Likelihood Augmented Discrete Generative Adversarial Networks , 2017, ArXiv.

[43] Naren Ramakrishnan,et al. Neural Abstractive Text Summarization with Sequence-to-Sequence Models , 2018, Trans. Data Sci..