Scalable and Accurate Dialogue State Tracking via Hierarchical Sequence Generation

Existing approaches to dialogue state tracking rely on pre-defined ontologies consisting of a set of all possible slot types and values. Though such approaches exhibit promising performance on single-domain benchmarks, they suffer from computational complexity that increases proportionally to the number of pre-defined slots that need tracking. This issue becomes more severe when it comes to multi-domain dialogues which include larger numbers of slots. In this paper, we investigate how to approach DST using a generation framework without the pre-defined ontology list. Given each turn of user utterance and system response, we directly generate a sequence of belief states by applying a hierarchical encoder-decoder structure. In this way, the computational complexity of our model will be a constant regardless of the number of pre-defined slots. Experiments on both the multi-domain and the single domain dialogue state tracking dataset show that our model not only scales easily with the increasing number of pre-defined domains and slots but also reaches the state-of-the-art performance.

[1]  Wei Wu,et al.  SGM: Sequence Generation Model for Multi-label Classification , 2018, COLING.

[2]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[3]  Fei Liu,et al.  Dialog state tracking, a machine reading approach using Memory Network , 2016, EACL.

[4]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[5]  Dilek Z. Hakkani-Tür,et al.  Dialog State Tracking: A Neural Reading Comprehension Approach , 2019, SIGdial.

[6]  Richard Socher,et al.  Global-Locally Self-Attentive Encoder for Dialogue State Tracking , 2018, ACL.

[7]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[9]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[10]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[11]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[12]  Lu Chen,et al.  Towards Universal Dialogue State Tracking , 2018, EMNLP.

[13]  Richard Socher,et al.  Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems , 2019, ACL.

[14]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[15]  Dilek Z. Hakkani-Tür,et al.  Scalable multi-domain dialogue state tracking , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Tsung-Hsien Wen,et al.  Neural Belief Tracker: Data-Driven Dialogue State Tracking , 2016, ACL.

[18]  Matthew Henderson,et al.  The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[19]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[20]  Yun Fu,et al.  Residual Dense Network for Image Super-Resolution , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Qi Hu,et al.  An End-to-end Approach for Handling Unknown Slot Values in Dialogue State Tracking , 2018, ACL.

[22]  Pawel Budzianowski,et al.  Large-Scale Multi-Domain Belief Tracking with Knowledge Sharing , 2018, ACL.

[23]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[24]  Dilek Z. Hakkani-Tür,et al.  HyST: A Hybrid Approach for Flexible and Accurate Dialogue State Tracking , 2019, INTERSPEECH.

[25]  Ehsan Hosseini-Asl,et al.  Toward Scalable Neural Dialogue State Tracking Model , 2018, ArXiv.

[26]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.