Few-Shot Generalization Across Dialogue Tasks

Machine-learning based dialogue managers are able to learn complex behaviors in order to complete a task, but it is not straightforward to extend their capabilities to new domains. We investigate different policies' ability to handle uncooperative user behavior, and how well expertise in completing one task (such as restaurant reservations) can be reapplied when learning a new one (e.g. booking a hotel). We introduce the Recurrent Embedding Dialogue Policy (REDP), which embeds system actions and dialogue states in the same vector space. REDP contains a memory component and attention mechanism based on a modified Neural Turing Machine, and significantly outperforms a baseline LSTM classifier on this task. We also show that both our architecture and baseline solve the bAbI dialogue task, achieving 100% test accuracy.

[1]  Kallirroi Georgila,et al.  Hybrid reinforcement/supervised learning for dialogue policies from COMMUNICATOR data , 2005 .

[2]  David Vandyke,et al.  Multi-domain Dialog State Tracking using Recurrent Neural Networks , 2015, ACL.

[3]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[4]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[5]  Jason D. Williams,et al.  Web-style ranking and SLU combination for dialog state tracking , 2014, SIGDIAL Conference.

[6]  Roberto Pieraccini,et al.  A stochastic model of human-machine interaction for learning dialog strategies , 2000, IEEE Trans. Speech Audio Process..

[7]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[8]  Dongho Kim,et al.  Incremental on-line adaptation of POMDP-based dialogue managers to extended domains , 2014, INTERSPEECH.

[9]  Jason D. Williams,et al.  The best of both worlds: unifying conventional dialog systems and POMDPs , 2008, INTERSPEECH.

[10]  Matthew Henderson,et al.  The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[11]  Erhardt Barth,et al.  Recurrent Dropout without Memory Loss , 2016, COLING.

[12]  Satoshi Nakamura,et al.  Statistical dialog management applied to WFST-based dialog systems , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Nick Pawlowski,et al.  Rasa: Open Source Language Understanding and Dialogue Management , 2017, ArXiv.

[14]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[15]  David Griol,et al.  A statistical approach to spoken dialog systems design and evaluation , 2008, Speech Commun..

[16]  Haizhou Li,et al.  IRIS: a Chat-oriented Dialogue System based on the Vector Space Model , 2012, ACL.

[17]  Geoffrey Zweig,et al.  Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning , 2017, ACL.

[18]  Alexander I. Rudnicky,et al.  Learning Conversational Systems that Interleave Task and Non-Task Content , 2017, IJCAI.

[19]  Oliver Lemon,et al.  Combining Chat and Task-Based Multimodal Dialogue for More Engaging HRI: A Scalable Method Using Reinforcement Learning , 2017, HRI.

[20]  Matthew Henderson,et al.  Word-Based Dialog State Tracking with Recurrent Neural Networks , 2014, SIGDIAL Conference.

[21]  S. Singh,et al.  Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[22]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[23]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[24]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[25]  Jason Williams,et al.  Multi-domain learning and generalization in dialog state tracking , 2013, SIGDIAL Conference.

[26]  Jason Weston,et al.  StarSpace: Embed All The Things! , 2017, AAAI.

[27]  Gary Geunbae Lee,et al.  Example-based dialog modeling for practical multi-domain dialog system , 2009, Speech Commun..

[28]  Yann Ollivier,et al.  Can recurrent neural networks warp time? , 2018, ICLR.