Exploring Implicit Feedback for Open Domain Conversation Generation

User feedback can be an effective indicator to the success of the human-robot conversation. However, to avoid to interrupt the online real-time conversation process, explicit feedback is usually gained at the end of a conversation. Alternatively, users’ responses usually contain their implicit feedback, such as stance, sentiment, emotion, etc., towards the conversation content or the interlocutors. Therefore, exploring the implicit feedback is a natural way to optimize the conversation generation process. In this paper, we propose a novel reward function which explores the implicit feedback to optimize the future reward of a reinforcement learning based neural conversation model. A simulation strategy is applied to explore the state-action space in training and test. Experimental results show that the proposed approach outperforms the Seq2Seq model and the state-of-the-art reinforcement learning model for conversation generation on automatic and human evaluations on the OpenSubtitles and Twitter datasets.

[1]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[2]  David Vandyke,et al.  Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[3]  Wei-Ying Ma,et al.  Hierarchical Recurrent Attention Network for Response Generation , 2017, AAAI.

[4]  Geoffrey Zweig,et al.  Attention with Intention for a Neural Network Conversation Model , 2015, ArXiv.

[5]  Hang Li,et al.  Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[6]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[7]  David Vandyke,et al.  Stochastic Language Generation in Dialogue using Recurrent Neural Networks with Convolutional Sentence Reranking , 2015, SIGDIAL Conference.

[8]  Rafael E. Banchs Movie-DiC: a Movie Dialogue Corpus for Research and Development , 2012, ACL.

[9]  Wojciech Zaremba,et al.  Reinforcement Learning Neural Turing Machines - Revised , 2015 .

[10]  Wei-Ying Ma,et al.  Topic Augmented Neural Response Generation with a Joint Attention Mechanism , 2016, ArXiv.

[11]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[12]  Milica Gasic,et al.  The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management , 2010, Comput. Speech Lang..

[13]  Milica Gasic,et al.  Phrase-Based Statistical Language Generation Using Graphical Models and Active Learning , 2010, ACL.

[14]  Alan Ritter,et al.  Unsupervised Modeling of Twitter Conversations , 2010, NAACL.

[15]  David Vandyke,et al.  On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems , 2016, ACL.

[16]  Kevin Knight,et al.  Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[17]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[18]  Jianfeng Gao,et al.  A Persona-Based Neural Conversation Model , 2016, ACL.

[19]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[20]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[21]  Marilyn A. Walker,et al.  An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email , 2000, J. Artif. Intell. Res..

[22]  Pascal Poupart,et al.  Online Sequence-to-Sequence Reinforcement Learning for Open-Domain Conversational Agents , 2016 .

[23]  Roger Evans,et al.  Empirically-based Control of Natural Language Generation , 2005, ACL.

[24]  Yue Zhang,et al.  Context-Sensitive Lexicon Features for Neural Sentiment Analysis , 2016, EMNLP.

[25]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[26]  Marilyn A. Walker,et al.  Trainable Generation of Big-Five Personality Styles through Data-Driven Parameter Estimation , 2008, ACL.

[27]  Dongho Kim,et al.  Incremental on-line adaptation of POMDP-based dialogue managers to extended domains , 2014, INTERSPEECH.

[28]  David Vandyke,et al.  Multi-domain Neural Network Language Generation for Spoken Dialogue Systems , 2016, NAACL.

[29]  Amy Isard,et al.  Individuality and Alignment in Generated Dialogues , 2006, INLG.

[30]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[31]  Elena Paslaru Bontas Simperl,et al.  A Neural Network Approach for Knowledge-Driven Response Generation , 2016, COLING.

[32]  Bowen Zhou,et al.  Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation , 2016, AAAI.

[33]  Xiang Li,et al.  StalemateBreaker: A Proactive Content-Introducing Approach to Automatic Human-Computer Conversation , 2016, IJCAI.

[34]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[35]  David Vandyke,et al.  Continuously Learning Neural Dialogue Management , 2016, ArXiv.

[36]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[37]  Rui Yan,et al.  Sequence to Backward and Forward Sequences: A Content-Introducing Approach to Generative Short-Text Conversation , 2016, COLING.

[38]  Alan Ritter,et al.  Data-Driven Response Generation in Social Media , 2011, EMNLP.

[39]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[40]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[41]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[42]  Jlfnm Fpoli,et al.  Training a Sentence Planner for Spoken Dialogue Using Boosting , 2002 .

[43]  Steve J. Young,et al.  Stochastic Language Generation in Dialogue using Factored Language Models , 2014, Computational Linguistics.

[44]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[45]  Oliver Lemon,et al.  Natural Language Generation as Planning Under Uncertainty for Spoken Dialogue Systems , 2009, EACL.

[46]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[47]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[48]  Haizhou Li,et al.  IRIS: a Chat-oriented Dialogue System based on the Vector Space Model , 2012, ACL.

[49]  Jianfeng Gao,et al.  Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access , 2016, ACL.