Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models

We investigate the task of building open domain, conversational dialogue systems based on large dialogue corpora using generative models. Generative models produce system responses that are autonomously generated word-by-word, opening up the possibility for realistic, flexible interactions. In support of this goal, we extend the recently proposed hierarchical recurrent encoder-decoder neural network to the dialogue domain, and demonstrate that this model is competitive with state-of-the-art neural language models and back-off n-gram models. We investigate the limitations of this and similar approaches, and show how its performance can be improved by bootstrapping the learning from a larger question-answer pair corpus and from pretrained word embeddings.

[1]  Xiao Li,et al.  Machine Learning Paradigms for Speech Recognition: An Overview , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[3]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[4]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[5]  Helen F. Hastie,et al.  A survey on metrics for the evaluation of user simulations , 2012, The Knowledge Engineering Review.

[6]  Jakob Grue Simonsen,et al.  A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion , 2015, CIKM.

[7]  Tomoki Toda,et al.  Developing Non-goal Dialog System Based on Examples of Drama Television , 2012, Natural Interaction with Robots, Knowbots and Smartphones, Putting Spoken Dialog Systems into Practice.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Alex Graves,et al.  Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.

[10]  Jianfeng Gao,et al.  deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets , 2015, ACL.

[11]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[12]  Matthew Henderson,et al.  Deep Neural Network Approach for the Dialog State Tracking Challenge , 2013, SIGDIAL Conference.

[13]  武田 一哉,et al.  Recurrent Neural Networkに基づく日常生活行動認識 , 2016 .

[14]  Matthias Scheutz,et al.  Tell me when and why to do it! Run-time planner model updates via natural language instruction , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[15]  Alan Ritter,et al.  Unsupervised Modeling of Twitter Conversations , 2010, NAACL.

[16]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[17]  Jürgen Schmidhuber,et al.  A Clockwork RNN , 2014, ICML.

[18]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[19]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[20]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[21]  Roberto Pieraccini,et al.  A stochastic model of human-machine interaction for learning dialog strategies , 2000, IEEE Trans. Speech Audio Process..

[22]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Matthew Henderson,et al.  Word-Based Dialog State Tracking with Recurrent Neural Networks , 2014, SIGDIAL Conference.

[24]  Haizhou Li,et al.  IRIS: a Chat-oriented Dialogue System based on the Vector Space Model , 2012, ACL.

[25]  Herbert H. Clark,et al.  Grounding in communication , 1991, Perspectives on socially shared cognition.

[26]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[27]  Hang Li,et al.  Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[28]  Dongho Kim,et al.  On-line policy optimisation of Bayesian spoken dialogue systems via human interaction , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Kallirroi Georgila,et al.  Quantitative Evaluation of User Simulation Techniques for Spoken Dialogue Systems , 2005, SIGDIAL.

[30]  Rafael E. Banchs Movie-DiC: a Movie Dialogue Corpus for Research and Development , 2012, ACL.

[31]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[32]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[33]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[34]  Jianfeng Gao,et al.  A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.

[35]  Camille Guinaudeau,et al.  TVD: A Reproducible and Multiply Aligned TV Series Dataset , 2014, LREC.

[36]  P. Langley,et al.  Abductive understanding of dialogues about joint activities , 2014 .

[37]  Yoshua Bengio,et al.  Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[38]  Ruslan Salakhutdinov,et al.  Multimodal Neural Language Models , 2014, ICML.

[39]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[40]  David W. Aha,et al.  The Ubuntu Chat Corpus for Multiparticipant Chat Analysis , 2013, AAAI Spring Symposium: Analyzing Microtext.

[41]  John E. Laird,et al.  Learning Goal-Oriented Hierarchical Tasks from Situated Interactive Instruction , 2014, AAAI.

[42]  Luísa Coheur,et al.  Luke, I am Your Father: Dealing with Out-of-Domain Requests by Using Movies Subtitles , 2014, IVA.

[43]  Micha Elsner,et al.  You Talking to Me? A Corpus and Algorithm for Conversation Disentanglement , 2008, ACL.

[44]  S. Singh,et al.  Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[45]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[46]  Eric Atwell,et al.  Chatbots: Are they Really Useful? , 2007, LDV Forum.

[47]  David Suendermann-Oeft,et al.  Are We There Yet? Research in Commercial Spoken Dialog Systems , 2009, TSD.

[48]  Geoffrey Zweig,et al.  Joint Language and Translation Modeling with Recurrent Neural Networks , 2013, EMNLP.

[49]  Pierfranca Forchini Spontaneity reloaded: American face-to-face and movie conversation compared , 2009 .

[50]  Alan Ritter,et al.  Data-Driven Response Generation in Social Media , 2011, EMNLP.

[51]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[52]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[53]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.