Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation

We introduce the multiresolution recurrent neural network, which extends the sequence-to-sequence framework to model natural language generation as two parallel discrete stochastic processes: a sequence of high-level coarse tokens, and a sequence of natural language tokens. There are many ways to estimate or learn the high-level coarse tokens, but we argue that a simple extraction procedure is sufficient to capture a wealth of high-level discourse semantics. Such procedure allows training the multiresolution recurrent neural network by maximizing the exact joint log-likelihood over both sequences. In contrast to the standard log- likelihood objective w.r.t. natural language tokens (word perplexity), optimizing the joint log-likelihood biases the model towards modeling high-level abstractions. We apply the proposed model to the task of dialogue response generation in two challenging domains: the Ubuntu technical support domain, and Twitter conversations. On Ubuntu, the model outperforms competing approaches by a substantial margin, achieving state-of-the-art results according to both automatic evaluation metrics and a human evaluation study. On Twitter, the model appears to generate more relevant and on-topic responses according to automatic evaluation metrics. Finally, our experiments demonstrate that the proposed model is more adept at overcoming the sparsity of natural language and is better able to capture long-term structure.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Giuseppe Riccardi,et al.  How may I help you? , 1997, Speech Commun..

[3]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[4]  Stephen Young Probabilistic methods in spoken–dialogue systems , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[5]  S. Singh,et al.  Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[6]  Kallirroi Georgila,et al.  Quantitative Evaluation of User Simulation Techniques for Spoken Dialogue Systems , 2005, SIGDIAL Workshop.

[7]  Philipp Koehn,et al.  Manual and Automatic Evaluation of Machine Translation between European Languages , 2006, WMT@HLT-NAACL.

[8]  Eric N. Forsyth Improving automated lexical and discourse analysis of online chat dialog , 2007 .

[9]  Eric Atwell,et al.  Chatbots: Are they Really Useful? , 2007, LDV Forum.

[10]  S. Bangalore,et al.  Learning the Structure of Task-Driven Human–Human Dialogs , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Stephen G. Pulman,et al.  Unsupervised Classification of Dialogue Acts using a Dirichlet Process Mixture Model , 2009, SIGDIAL Conference.

[12]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[13]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[14]  Alan Ritter,et al.  Data-Driven Response Generation in Social Media , 2011, EMNLP.

[15]  Yoshua Bengio,et al.  Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription , 2012, ICML.

[16]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[17]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[18]  Alex Graves,et al.  Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.

[19]  Brendan T. O'Connor,et al.  Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[20]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[21]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[22]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[23]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[24]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[25]  Jürgen Schmidhuber,et al.  A Clockwork RNN , 2014, ICML.

[26]  Ke Zhai,et al.  Discovering Latent Structure in Task-Oriented Dialogues , 2014, ACL.

[27]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[28]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[29]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[30]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[31]  Jianfeng Gao,et al.  A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.

[32]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[33]  Joelle Pineau,et al.  The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[36]  Hang Li,et al.  Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[37]  Jakob Grue Simonsen,et al.  A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion , 2015, CIKM.

[38]  Diederik P. Kingma,et al.  Variational Recurrent Auto-Encoders , 2014, ICLR.

[39]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[40]  Gholamreza Haffari,et al.  A Latent Variable Recurrent Neural Network for Discourse Relation Language Models , 2016 .

[41]  Joelle Pineau,et al.  How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.

[42]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[43]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[44]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[45]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[46]  Ryan P. Adams,et al.  Structured VAEs: Composing Probabilistic Graphical Models and Variational Autoencoders , 2016 .

[47]  Peter Young,et al.  Smart Reply: Automated Response Suggestion for Email , 2016, KDD.

[48]  Ji Wu,et al.  Hidden Softmax Sequence Model for Dialogue Structure Analysis , 2016, ACL.

[49]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[50]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[51]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[52]  Joelle Pineau,et al.  A Survey of Available Corpora for Building Data-Driven Dialogue Systems , 2015, Dialogue Discourse.