论文信息 - Deep Reinforcement Learning for Dialogue Generation

Deep Reinforcement Learning for Dialogue Generation

Recent neural models of dialogue generation offer great promise for generating responses for conversational agents, but tend to be shortsighted, predicting utterances one at a time while ignoring their influence on future outcomes. Modeling the future direction of a dialogue is crucial to generating coherent, interesting dialogues, a need which led traditional NLP models of dialogue to draw on reinforcement learning. In this paper, we show how to integrate these goals, applying deep reinforcement learning to model future reward in chatbot dialogue. The model simulates dialogues between two virtual agents, using policy gradient methods to reward sequences that display three useful conversational properties: informativity (non-repetitive turns), coherence, and ease of answering (related to forward-looking function). We evaluate our model on diversity, length as well as with human judges, showing that the proposed algorithm generates more interactive responses and manages to foster a more sustained conversation in dialogue simulation. This work marks a first step towards learning a neural conversational model based on the long-term success of dialogues.

[1] E. Schegloff,et al. Opening up Closings , 1973 .

[2] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.

[3] Joakim Nivre,et al. On the Semantics and Pragmatics of Linguistic Feedback , 1992, J. Semant..

[4] Roberto Pieraccini,et al. Learning dialogue strategies within the Markov decision process framework , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[5] Marilyn A. Walker,et al. Reinforcement Learning for Spoken Dialogue Systems , 1999, NIPS.

[6] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[7] Marilyn A. Walker,et al. An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email , 2000, J. Artif. Intell. Res..

[8] Marilyn A. Walker,et al. Empirical Evaluation of a Reinforcement Learning Spoken Dialogue System , 2000, AAAI/IAAI.

[9] Roberto Pieraccini,et al. A stochastic model of human-machine interaction for learning dialog strategies , 2000, IEEE Trans. Speech Audio Process..

[10] Alexander I. Rudnicky,et al. Stochastic Language Generation for Spoken Dialogue Systems , 2000 .

[11] J. Spall. STOCHASTIC OPTIMIZATION , 2002 .

[12] Adwait Ratnaparkhi,et al. Trainable approaches to surface natural language generation and their application to conversational dialog systems , 2002, Comput. Speech Lang..

[13] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[14] S. Singh,et al. Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[15] Marilyn A. Walker,et al. A trainable generator for recommendations in multimodal dialog , 2003, INTERSPEECH.

[16] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[17] Steve J. Young,et al. A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies , 2006, The Knowledge Engineering Review.

[18] David Suendermann-Oeft,et al. Are We There Yet? Research in Commercial Spoken Dialog Systems , 2009, TSD.

[19] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[20] Milica Gasic,et al. The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management , 2010, Comput. Speech Lang..

[21] Daniel Jurafsky,et al. Learning to Follow Navigational Directions , 2010, ACL.

[22] Alan Ritter,et al. Data-Driven Response Generation in Social Media , 2011, EMNLP.

[23] Regina Barzilay,et al. Learning to Win by Reading Manuals in a Monte-Carlo Framework , 2011, ACL.

[24] Tomoki Toda,et al. Developing Non-goal Dialog System Based on Examples of Drama Television , 2012, Natural Interaction with Robots, Knowbots and Smartphones, Putting Spoken Dialog Systems into Practice.

[25] Dongho Kim,et al. POMDP-based dialogue manager adaptation to extended domains , 2013, SIGDIAL Conference.

[26] Dongho Kim,et al. On-line policy optimisation of Bayesian spoken dialogue systems via human interaction , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[28] Milica Gasic,et al. POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[29] Dongho Kim,et al. Incremental on-line adaptation of POMDP-based dialogue managers to extended domains , 2014, INTERSPEECH.

[30] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[31] David Vandyke,et al. Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[32] Jianfeng Gao,et al. A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.

[33] Wojciech Zaremba,et al. Reinforcement Learning Neural Turing Machines - Revised , 2015 .

[34] Jianfeng Gao,et al. deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets , 2015, ACL.

[35] Joelle Pineau,et al. Hierarchical Neural Network Generative Models for Movie Dialogues , 2015, ArXiv.

[36] Quoc V. Le,et al. A Neural Conversational Model , 2015, ArXiv.