UniConv: A Unified Conversational Neural Architecture for Multi-domain Task-oriented Dialogues

Building an end-to-end conversational agent for multi-domain task-oriented dialogues has been an open challenge for two main reasons. First, tracking dialogue states of multiple domains is non-trivial as the dialogue agent must obtain complete states from all relevant domains, some of which might have shared slots among domains as well as unique slots specifically for one domain only. Second, the dialogue agent must also process various types of information across domains, including dialogue context, dialogue states, and database, to generate natural responses to users. Unlike the existing approaches that are often designed to train each module separately, we propose "UniConv" -- a novel unified neural architecture for end-to-end conversational systems in multi-domain task-oriented dialogues, which is designed to jointly train (i) a Bi-level State Tracker which tracks dialogue states by learning signals at both slot and domain level independently, and (ii) a Joint Dialogue Act and Response Generator which incorporates information from various input components and models dialogue acts and target responses simultaneously. We conduct comprehensive experiments in dialogue state tracking, context-to-text, and end-to-end settings on the MultiWOZ2.1 benchmark, achieving superior performance over competitive baselines.

[1]  Min-Yen Kan,et al.  Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence Architectures , 2018, ACL.

[2]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jianfeng Gao,et al.  End-to-End Task-Completion Neural Dialogue Systems , 2017, IJCNLP.

[4]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[5]  Yangming Li,et al.  Entity-Consistent End-to-end Task-Oriented Dialogue System with KB Retriever , 2019, EMNLP.

[6]  Ivan Vulić,et al.  Hello, It’s GPT-2 - How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems , 2019, EMNLP.

[7]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[8]  David Vandyke,et al.  Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[9]  Richard Socher,et al.  Non-Autoregressive Dialog State Tracking , 2020, ICLR.

[10]  Jiahuan Pei,et al.  A Modular Task-oriented Dialogue System Using a Neural Mixture-of-Experts , 2019, ArXiv.

[11]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[12]  Mihail Eric,et al.  MultiWOZ 2. , 2019 .

[13]  David Vandyke,et al.  Conditional Generation and Snapshot Learning in Neural Dialogue Systems , 2016, EMNLP.

[14]  Kam-Fai Wong,et al.  Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning , 2017, EMNLP.

[15]  Jianmo Ni,et al.  Scalable and Accurate Dialogue State Tracking via Hierarchical Sequence Generation , 2019, EMNLP.

[16]  Joelle Pineau,et al.  How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.

[17]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[18]  Christopher D. Manning,et al.  A Copy-Augmented Sequence-to-Sequence Architecture Gives Good Performance on Task-Oriented Dialogue , 2017, EACL.

[19]  Gökhan Tür,et al.  Flexibly-Structured Model for Task-Oriented Dialogues , 2019, SIGdial.

[20]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[21]  Matthew Henderson,et al.  The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[22]  Gyuwan Kim,et al.  Efficient Dialogue State Tracking by Selectively Overwriting Memory , 2020, ACL.

[23]  Dilek Z. Hakkani-Tür,et al.  MultiWOZ 2.1: Multi-Domain Dialogue State Corrections and State Tracking Baselines , 2019, ArXiv.

[24]  Richard Socher,et al.  Global-to-local Memory Pointer Networks for Task-Oriented Dialogue , 2019, ICLR.

[25]  Feng Ji,et al.  Teacher-Student Framework Enhanced Multi-domain Dialogue Generation , 2019, ArXiv.

[26]  Richard Socher,et al.  Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems , 2019, ACL.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[29]  Zhijian Ou,et al.  Task-Oriented Dialog Systems that Consider Multiple Appropriate Responses under the Same Context , 2019, AAAI.

[30]  Dilek Z. Hakkani-Tür,et al.  DeepCopy: Grounded Response Generation with Hierarchical Pointer Networks , 2019, SIGdial.

[31]  Yun-Nung Chen,et al.  Natural Language Generation by Hierarchical Decoding with Linguistic Patterns , 2018, NAACL.

[32]  Julien Perez,et al.  Gated End-to-End Memory Networks , 2016, EACL.

[33]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[34]  Maxine Eskénazi,et al.  Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models , 2019, NAACL.

[35]  Maxine Eskénazi,et al.  Structured Fusion Networks for Dialog , 2019, SIGdial.

[36]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[37]  Christopher D. Manning,et al.  Key-Value Retrieval Networks for Task-Oriented Dialogue , 2017, SIGDIAL Conference.

[38]  Pascale Fung,et al.  Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Oriented Dialog Systems , 2018, ACL.

[39]  Pararth Shah,et al.  Multi-Action Dialog Policy Learning with Interactive Human Teaching , 2020, SIGDIAL.

[40]  Pawel Budzianowski,et al.  Large-Scale Multi-Domain Belief Tracking with Knowledge Sharing , 2018, ACL.

[41]  Danish Contractor,et al.  2019 Formatting Instructions for Authors Using LaTeX , 2018 .

[42]  Wenhu Chen,et al.  Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention , 2019, ACL.

[43]  Maxine Eskénazi,et al.  Generative Encoder-Decoder Models for Task-Oriented Spoken Dialog Systems with Chatting Capability , 2017, SIGDIAL Conference.

[44]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[45]  Dilek Z. Hakkani-Tür,et al.  Dialog State Tracking: A Neural Reading Comprehension Approach , 2019, SIGdial.

[46]  Geoffrey Zweig,et al.  Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning , 2017, ACL.

[47]  Bing Liu,et al.  An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented Dialog , 2017, INTERSPEECH.

[48]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[49]  Kam-Fai Wong,et al.  Integrating planning for task-completion dialogue policy learning , 2018, ACL.

[50]  Tae-Yoon Kim,et al.  SUMBT: Slot-Utterance Matching for Universal and Scalable Belief Tracking , 2019, ACL.

[51]  Matthew Henderson,et al.  Word-Based Dialog State Tracking with Recurrent Neural Networks , 2014, SIGDIAL Conference.

[52]  Li Zhou,et al.  Multi-domain Dialogue State Tracking as Dynamic Knowledge Graph Enhanced Question Answering , 2019, ArXiv.

[53]  Dilek Z. Hakkani-Tür,et al.  HyST: A Hybrid Approach for Flexible and Accurate Dialogue State Tracking , 2019, INTERSPEECH.