A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

Sequential data often possesses a hierarchical structure with complex dependencies between subsequences, such as found between the utterances in a dialogue. In an effort to model this kind of generative process, we propose a neural network-based generative architecture, with latent stochastic variables that span a variable number of time steps. We apply the proposed model to the task of dialogue response generation and compare it with recent neural network architectures. We evaluate the model performance through automatic evaluation metrics and by carrying out a human evaluation. The experiments demonstrate that our model improves upon recently proposed models and that the latent variables facilitate the generation of long outputs and maintain the context.

[1]  Jianfeng Gao,et al.  deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets , 2015, ACL.

[2]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[3]  Helen F. Hastie,et al.  A survey on metrics for the evaluation of user simulations , 2012, The Knowledge Engineering Review.

[4]  Jakob Grue Simonsen,et al.  A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion , 2015, CIKM.

[5]  Christian Osendorfer,et al.  Learning Stochastic Recurrent Networks , 2014, NIPS 2014.

[6]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[7]  Giuseppe Riccardi,et al.  How may I help you? , 1997, Speech Commun..

[8]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[9]  Ke Zhai,et al.  Discovering Latent Structure in Task-Oriented Dialogues , 2014, ACL.

[10]  Jianfeng Gao,et al.  A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.

[11]  Diederik P. Kingma,et al.  Variational Recurrent Auto-Encoders , 2014, ICLR.

[12]  Joelle Pineau,et al.  How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.

[13]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[14]  Vasile Rus,et al.  A Comparison of Greedy and Optimal Assessment of Natural Language Student Input Using Word-to-Word Similarity Metrics , 2012, BEA@NAACL-HLT.

[15]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[16]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[17]  S. Singh,et al.  Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[18]  Stephen Young Probabilistic methods in spoken–dialogue systems , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[19]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[20]  Tomek Strzalkowski,et al.  VCA: An Experiment with a Multiparty Virtual Chat Agent , 2010 .

[21]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[22]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[23]  Joelle Pineau,et al.  Bootstrapping Dialog Systems with Word Embeddings , 2014 .

[24]  Joelle Pineau,et al.  The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[25]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[26]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Alex Graves,et al.  Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.

[29]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[30]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[31]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[32]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[33]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[34]  Alan Ritter,et al.  Data-Driven Response Generation in Social Media , 2011, EMNLP.

[35]  Stephen G. Pulman,et al.  Unsupervised Classification of Dialogue Acts using a Dirichlet Process Mixture Model , 2009, SIGDIAL Conference.

[36]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[37]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[38]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[39]  Yoshua Bengio,et al.  Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription , 2012, ICML.

[40]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[41]  Peter Young,et al.  Smart Reply: Automated Response Suggestion for Email , 2016, KDD.