Learning to Memorize in Neural Task-Oriented Dialogue Systems

In this thesis, we leverage the neural copy mechanism and memory-augmented neural networks (MANNs) to address existing challenge of neural task-oriented dialogue learning. We show the effectiveness of our strategy by achieving good performance in multi-domain dialogue state tracking, retrieval-based dialogue systems, and generation-based dialogue systems. We first propose a transferable dialogue state generator (TRADE) that leverages its copy mechanism to get rid of dialogue ontology and share knowledge between domains. We also evaluate unseen domain dialogue state tracking and show that TRADE enables zero-shot dialogue state tracking and can adapt to new few-shot domains without forgetting the previous domains. Second, we utilize MANNs to improve retrieval-based dialogue learning. They are able to capture dialogue sequential dependencies and memorize long-term information. We also propose a recorded delexicalization copy strategy to replace real entity values with ordered entity types. Our models are shown to surpass other retrieval baselines, especially when the conversation has a large number of turns. Lastly, we tackle generation-based dialogue learning with two proposed models, the memory-to-sequence (Mem2Seq) and global-to-local memory pointer network (GLMP). Mem2Seq is the first model to combine multi-hop memory attention with the idea of the copy mechanism. GLMP further introduces the concept of response sketching and double pointers copying. We show that GLMP achieves the state-of-the-art performance on human evaluation.

[1]  Geoffrey Zweig,et al.  Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning , 2017, ACL.

[2]  Lihong Li,et al.  Neural Approaches to Conversational AI , 2019, Found. Trends Inf. Retr..

[3]  Richard Socher,et al.  Global-Locally Self-Attentive Encoder for Dialogue State Tracking , 2018, ACL.

[4]  Pascale Fung,et al.  Towards Empathetic Human-Robot Interactions , 2016, CICLing.

[5]  Enrique Alfonseca,et al.  Learning to Attend, Copy, and Generate for Session-Based Query Suggestion , 2017, CIKM.

[6]  Min-Yen Kan,et al.  Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence Architectures , 2018, ACL.

[7]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8]  David Vandyke,et al.  Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[9]  Steve J. Young,et al.  Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems , 2010, Comput. Speech Lang..

[10]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[11]  Yoko Yamakata,et al.  FlowGraph2Text: Automatic Sentence Skeleton Compilation for Procedural Text Generation , 2014, INLG.

[12]  Pascale Fung,et al.  Empathetic Dialog Systems , 2018 .

[13]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[14]  Pascale Fung,et al.  Zara: A Virtual Interactive Dialogue System Incorporating Emotion, Sentiment and Personality Recognition , 2016, COLING.

[15]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[16]  Julien Perez,et al.  Gated End-to-End Memory Networks , 2016, EACL.

[17]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[18]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[19]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[20]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[21]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[22]  Antoine Raux,et al.  The Dialog State Tracking Challenge Series: A Review , 2016, Dialogue Discourse.

[23]  Maxine Eskénazi,et al.  Zero-Shot Dialog Generation with Cross-Domain Latent Actions , 2018, SIGDIAL Conference.

[24]  Matthew Henderson,et al.  The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[25]  Gökhan Tür,et al.  Towards deeper understanding: Deep convex networks for semantic utterance classification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Steve J. Young,et al.  USING POMDPS FOR DIALOG MANAGEMENT , 2006, 2006 IEEE Spoken Language Technology Workshop.

[27]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[28]  He Xiaodong,et al.  Zero-shot learning of intent embeddings for expansion by convolutional deep structured semantic models , 2016 .

[29]  Aurko Roy,et al.  Learning to Remember Rare Events , 2017, ICLR.

[30]  Giuseppe Riccardi,et al.  Generative and discriminative algorithms for spoken language understanding , 2007, INTERSPEECH.

[31]  Alexander I. Rudnicky AN AGENDA-BASED DIALOG MANAGEMENT ARCHITECTURE FOR SPOKEN LANGUAGE SYSTEMS , 1999 .

[32]  Qi Hu,et al.  An End-to-end Approach for Handling Unknown Slot Values in Dialogue State Tracking , 2018, ACL.

[33]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[34]  Dilek Z. Hakkani-Tür,et al.  Scalable multi-domain dialogue state tracking , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[35]  Matthew Henderson,et al.  Robust dialog state tracking using delexicalised recurrent neural networks and unsupervised adaptation , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[36]  Lior Wolf,et al.  Language Generation with Recurrent Generative Adversarial Networks without Pre-training , 2017, ArXiv.

[37]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[38]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[39]  Geoffrey Zweig,et al.  Joint semantic utterance classification and slot filling with recursive neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[40]  Richard Socher,et al.  Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems , 2019, ACL.

[41]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[42]  Christopher D. Manning,et al.  Key-Value Retrieval Networks for Task-Oriented Dialogue , 2017, SIGDIAL Conference.

[43]  Pascale Fung,et al.  Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Oriented Dialog Systems , 2018, ACL.

[44]  Geoffrey Zweig,et al.  Spoken language understanding using long short-term memory neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[45]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[46]  Lihong Li,et al.  Reinforcement learning for dialog management using least-squares Policy iteration and fast feature selection , 2009, INTERSPEECH.

[47]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[48]  Richard Socher,et al.  Global-to-local Memory Pointer Networks for Task-Oriented Dialogue , 2019, ICLR.

[49]  Yoshimasa Tsuruoka,et al.  A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[50]  Peng Xu,et al.  Emo2Vec: Learning Generalized Emotion Representation by Multi-task Training , 2018, WASSA@EMNLP.

[51]  Tsung-Hsien Wen,et al.  Neural Belief Tracker: Data-Driven Dialogue State Tracking , 2016, ACL.

[52]  Ali Farhadi,et al.  Query-Reduction Networks for Question Answering , 2016, ICLR.

[53]  Pawel Budzianowski,et al.  Large-Scale Multi-Domain Belief Tracking with Knowledge Sharing , 2018, ACL.

[54]  Shuming Shi,et al.  Skeleton-to-Response: Dialogue Generation Guided by Retrieval Memory , 2018, NAACL.

[55]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[56]  Jason Weston,et al.  Tracking the World State with Recurrent Entity Networks , 2016, ICLR.

[57]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[58]  Ehsan Hosseini-Asl,et al.  Toward Scalable Neural Dialogue State Tracking Model , 2018, ArXiv.

[59]  Milica Gasic,et al.  Gaussian Processes for POMDP-Based Dialogue Manager Optimization , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[60]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[61]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[62]  Pascale Fung,et al.  End-to-End Recurrent Entity Network for Entity-Value Independent Goal-Oriented Dialog Learning , 2017 .

[63]  Gökhan Tür,et al.  Use of kernel deep convex networks and end-to-end learning for spoken language understanding , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[64]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[65]  Maxine Eskénazi,et al.  Generative Encoder-Decoder Models for Task-Oriented Spoken Dialog Systems with Chatting Capability , 2017, SIGDIAL Conference.

[66]  Alexander M. Rush,et al.  Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.

[67]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[68]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[69]  Bowen Zhou,et al.  Pointing the Unknown Words , 2016, ACL.

[70]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[71]  nhnguyen,et al.  Comparisons of Sequence Labeling Algorithms and Extensions , 2007 .

[72]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[73]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[74]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[75]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[76]  Bowen Zhou,et al.  Leveraging Sentence-level Information with Encoder LSTM for Semantic Slot Filling , 2016, EMNLP.

[77]  Jason Weston,et al.  Key-Value Memory Networks for Directly Reading Documents , 2016, EMNLP.

[78]  Jun Zhao,et al.  Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning , 2017, ACL.

[79]  Richard Socher,et al.  The Natural Language Decathlon: Multitask Learning as Question Answering , 2018, ArXiv.

[80]  Houfeng Wang,et al.  A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding , 2016, IJCAI.

[81]  Helmut Horacek,et al.  A Flexible Shallow Approach to Text Generation , 1998, INLG.

[82]  Gökhan Tür,et al.  Towards Zero-Shot Frame Semantic Parsing for Domain Scaling , 2017, INTERSPEECH.

[83]  Geoffrey Zweig,et al.  Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[84]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[85]  Stefan Ultes,et al.  MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling , 2018, EMNLP.

[86]  Mirella Lapata,et al.  Coarse-to-Fine Decoding for Neural Semantic Parsing , 2018, ACL.

[87]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[88]  Pascale Fung,et al.  End-to-End Dynamic Query Memory Network for Entity-Value Independent Task-Oriented Dialog , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[89]  Christopher D. Manning,et al.  A Copy-Augmented Sequence-to-Sequence Architecture Gives Good Performance on Task-Oriented Dialogue , 2017, EACL.

[90]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[91]  Giuseppe Attardi,et al.  Question Dependent Recurrent Entity Network for Question Answering , 2017, NL4AI@AI*IA.

[92]  Qun Liu,et al.  Memory-enhanced Decoder for Neural Machine Translation , 2016, EMNLP.

[93]  Hang Li,et al.  Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[94]  Jianfeng Gao,et al.  BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems , 2016, AAAI.