A Deep Reinforcement Learning Chatbot

We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including template-based models, bag-of-words models, sequence-to-sequence neural network and latent variable neural network models. By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble. The system has been evaluated through A/B testing with real-world users, where it performed significantly better than many competing systems. Due to its machine learning architecture, the system is likely to improve with additional data.

[1]  Joseph Weizenbaum,et al.  ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.

[2]  Kenneth Mark Colby,et al.  Clinical artificial intelligence , 1981, Behavioral and Brain Sciences.

[3]  Norman M. Fraser,et al.  Dialogue Management for Telephone Information Systems , 1992, ANLP.

[4]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[5]  Andrew C. Simpson,et al.  Black box and glass box evaluation of the SUNDIAL system , 1993, EUROSPEECH.

[6]  Volker Steinbiss,et al.  The Philips automatic train timetable information system , 1995, Speech Commun..

[7]  Roberto Pieraccini,et al.  User modeling for spoken dialogue system evaluation , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[8]  Marilyn A. Walker,et al.  Reinforcement Learning for Spoken Dialogue Systems , 1999, NIPS.

[9]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[10]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[11]  Doina Precup,et al.  Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.

[12]  Roberto Pieraccini,et al.  A stochastic model of human-machine interaction for learning dialog strategies , 2000, IEEE Trans. Speech Audio Process..

[13]  Sanjoy Dasgupta,et al.  Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.

[14]  S. Singh,et al.  Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[15]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[16]  D. Scott Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics , 2004 .

[17]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[18]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19]  David G. Novick,et al.  Root causes of lost time and user stress in a simple dialog system , 2005, INTERSPEECH.

[20]  H. Cuayahuitl,et al.  Human-computer dialogue simulation using hidden Markov models , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[21]  Maxine Eskénazi,et al.  Doing research on a deployed spoken dialogue system: one year of let's go! experience , 2006, INTERSPEECH.

[22]  Tim Paek,et al.  Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths and Weaknesses for Practical Deployment , 2006 .

[23]  Kallirroi Georgila,et al.  User simulation for spoken dialogue systems: learning and evaluation , 2006, INTERSPEECH.

[24]  Ye-Yi Wang,et al.  Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies , 2007 .

[25]  Eric Atwell,et al.  Chatbots: Are they Really Useful? , 2007, LDV Forum.

[26]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[27]  Tanja Schultz,et al.  Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers , 2007, HLT-NAACL 2007.

[28]  Hui Ye,et al.  Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.

[29]  David R. Traum,et al.  Multi-party, Multi-issue, Multi-strategy Negotiation for Multi-modal Virtual Agents , 2008, IVA.

[30]  Kallirroi Georgila,et al.  Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets , 2008, CL.

[31]  David Suendermann-Oeft,et al.  Are We There Yet? Research in Commercial Spoken Dialog Systems , 2009, TSD.

[32]  Peter A. Heeman,et al.  Representing the Reinforcement Learning state in a negotiation dialogue , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[33]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[34]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[35]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[36]  Jason Williams An Empirical Evaluation of a Statistical Dialog System in Public Use , 2011, SIGDIAL Conference.

[37]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[38]  Kallirroi Georgila,et al.  Reinforcement Learning of Argumentation Dialogue Policies in Negotiation , 2011, INTERSPEECH.

[39]  Maxine Eskénazi,et al.  POMDP-based Let's Go system for spoken dialog challenge , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[40]  Antoine Raux,et al.  The Dialog State Tracking Challenge , 2013, SIGDIAL Conference.

[41]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[42]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[43]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[44]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[45]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[46]  M. Marelli,et al.  SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment , 2014, *SEMEVAL.

[47]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[48]  Luísa Coheur,et al.  Luke, I am Your Father: Dealing with Out-of-Domain Requests by Using Movies Subtitles , 2014, IVA.

[49]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[50]  Lei Yu,et al.  Deep Learning for Answer Sentence Selection , 2014, ArXiv.

[51]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[52]  Ondrej Dusek,et al.  Alex: A Statistical Dialogue Systems Framework , 2014, TSD.

[53]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[54]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[55]  David Vandyke,et al.  Learning from real users: rating dialogue success with neural networks for reinforcement learning in spoken dialogue systems , 2015, INTERSPEECH.

[56]  David Suendermann-Oeft,et al.  HALEF: An Open-Source Standard-Compliant Telephony-Based Modular Spoken Dialog System: A Review and An Outlook , 2015, IWSDS.

[57]  Joelle Pineau,et al.  The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[58]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[59]  Iulian Serban Text-Based Speaker Identification For Multi-Participant Open-Domain Dialogue Systems , 2015 .

[60]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[61]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[62]  Joelle Pineau,et al.  How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.

[63]  David Vandyke,et al.  Continuously Learning Neural Dialogue Management , 2016, ArXiv.

[64]  Ramón López-Cózar,et al.  Automatic creation of scenarios for evaluating spoken dialogue systems via user-simulation , 2016, Knowl. Based Syst..

[65]  Guillaume Dubuisson Duplessis,et al.  Comparing System-response Retrieval Models for Open-domain and Casual Conversational Agent , 2016 .

[66]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[67]  Jing He,et al.  Policy Networks with Two-Stage Training for Dialogue Systems , 2016, SIGDIAL Conference.

[68]  Romain Laroche,et al.  Incremental Human-Machine Dialogue Simulation , 2017, IWSDS.

[69]  Joelle Pineau,et al.  Generative Deep Neural Networks for Dialogue: A Short Review , 2016, ArXiv.

[70]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[71]  Joelle Pineau,et al.  On the Evaluation of Dialogue Systems with Next Utterance Classification , 2016, SIGDIAL Conference.

[72]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[73]  Maxine Eskénazi,et al.  DialPort: Connecting the spoken dialog research community to real user data , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[74]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[75]  Antoine Raux,et al.  Introduction to the Special Issue on Dialogue State Tracking , 2016, Dialogue Discourse.

[76]  Angeliki Lazaridou,et al.  Towards Multi-Agent Communication-Based Language Learning , 2016, ArXiv.

[77]  Maxine Eskénazi,et al.  Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning , 2016, SIGDIAL Conference.

[78]  Zhou Yu,et al.  Strategy and Policy Learning for Non-Task-Oriented Conversational Systems , 2016, SIGDIAL Conference.

[79]  Jing He,et al.  A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems , 2016, INTERSPEECH.

[80]  Joelle Pineau,et al.  Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses , 2017, ACL.

[81]  Jason Weston,et al.  ParlAI: A Dialog Research Software Platform , 2017, EMNLP.

[82]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[83]  Bing Liu,et al.  Iterative policy learning in end-to-end trainable task-oriented neural dialog models , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[84]  Stefan Lee,et al.  Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[85]  Joelle Pineau,et al.  Training End-to-End Dialogue Systems with the Ubuntu Dialogue Corpus , 2017, Dialogue Discourse.

[86]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[87]  Joelle Pineau,et al.  MACA: A Modular Architecture for Conversational Agents , 2017, SIGDIAL Conference.

[88]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.