Incorporating Unstructured Textual Knowledge Sources into Neural Dialogue Systems

We present initial methods for incorporating unstructured external textual information into neural dialogue systems for predicting the next utterance of a user in a two-party chat conversation. The main objective is to leverage additional information about the topic of the conversation to improve the prediction accuracy. We propose a simple method for extracting this knowledge, using a combination of hashing and TF-IDF, and a way to use it for selecting the best next utterance of a conversation, by encoding a vector representation with a recurrent neural network (RNN). This is combined with an RNN encoding of the context and response of the conversation in order to make a prediction. We perform a case study using the recently released Ubuntu Dialogue Corpus, where the additional knowledge considered consists of the Ubuntu manpages. Preliminary results suggest that leveraging external knowledge sources in such a manner could lead to performance improvements for predicting the next utterance.

[1]  Roberto Pieraccini,et al.  A stochastic model of computer-human interaction for learning dialogue strategies , 1997, EUROSPEECH.

[2]  Alexander I. Rudnicky,et al.  Stochastic Language Generation for Spoken Dialogue Systems , 2000 .

[3]  Juan Enrique Ramos,et al.  Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .

[4]  Maxine Eskénazi,et al.  Let's go public! taking a spoken dialog system to the real world , 2005, INTERSPEECH.

[5]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Matthew Henderson,et al.  Deep Neural Network Approach for the Dialog State Tracking Challenge , 2013, SIGDIAL Conference.

[8]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[9]  Lei Yu,et al.  Deep Learning for Answer Sentence Selection , 2014, ArXiv.

[10]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[11]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[12]  David Vandyke,et al.  Stochastic Language Generation in Dialogue using Recurrent Neural Networks with Convolutional Sentence Reranking , 2015, SIGDIAL Conference.

[13]  Filip Jurcícek,et al.  Incremental LSTM-based dialog state tracker , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[14]  David Vandyke,et al.  Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[15]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[16]  Jianfeng Gao,et al.  A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.

[17]  Jason Weston,et al.  Large-scale Simple Question Answering with Memory Networks , 2015, ArXiv.

[18]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[19]  Joelle Pineau,et al.  Hierarchical Neural Network Generative Models for Movie Dialogues , 2015, ArXiv.

[20]  Joelle Pineau,et al.  The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[21]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.