TD-GIN: Token-level Dynamic Graph-Interactive Network for Joint Multiple Intent Detection and Slot Filling

Intent detection and slot filling are two main tasks for building a spoken language understanding (SLU) system. Currently, most work on SLU have focused on the single intent scenario, and paid less attention into the multi-intent scenario, which commonly exists in real-world scenarios. In addition, multi-intent SLU faces an unique challenges: how to effectively incorporate multiple intents information to guide the slot prediction. In this paper, we propose a Token-level Dynamic Graph-Interactive Network (TD-GIN) for joint multiple intent detection and slot filling, where we model the interaction between multiple intents and each token slot in a unified graph architecture. With graph interaction mechanism, our framework has the advantage to automatically extract the relevant intents information to guide each token slot prediction, making a fine-grained intent information integration for the token-level slot prediction. Experiments on two multi-intent datasets show that our model achieves the state-of-the-art performance and outperforms other previous methods by a large margin. Comprehensive analysis empirically shows that our framework successfully captures multiple relevant intents information to improve the SLU performance.

[1]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[2]  Rashmi Gangadharaiah,et al.  Joint Multiple Intent Detection and Slot Labeling for Goal-Oriented Dialog , 2019, NAACL.

[3]  Gary Geunbae Lee,et al.  Two-stage multi-intent detection for spoken language understanding , 2017, Multimedia Tools and Applications.

[4]  Hongxia Jin,et al.  A Bi-Model Based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling , 2018, NAACL.

[5]  Gökhan Tür,et al.  Optimizing SVMs for complex call classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Giuseppe Riccardi,et al.  Generative and discriminative algorithms for spoken language understanding , 2007, INTERSPEECH.

[8]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[9]  Bhuvana Ramabhadran,et al.  Deep belief nets for natural language call-routing , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Rafael E. Banchs,et al.  The Fourth Dialog State Tracking Challenge , 2016, IWSDS.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Meina Song,et al.  A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling , 2019, ACL.

[13]  Yangming Li,et al.  A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding , 2019, EMNLP.

[14]  Philip S. Yu,et al.  Joint Slot Filling and Intent Detection via Capsule Neural Networks , 2018, ACL.

[15]  Yidong Chen,et al.  Deep Semantic Role Labeling with Self-Attention , 2017, AAAI.

[16]  Ruhi Sarikaya,et al.  Convolutional neural network based triangular CRF for joint intent detection and slot filling , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[17]  George R. Doddington,et al.  The ATIS Spoken Language Systems Pilot Corpus , 1990, HLT.

[18]  Sebastian Schuster,et al.  Cross-lingual Transfer Learning for Multilingual Task Oriented Dialog , 2018, NAACL.

[19]  Tao Shen,et al.  DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.

[20]  Liang Li,et al.  A Self-Attentive Model with Gate Mechanism for Spoken Language Understanding , 2018, EMNLP.

[21]  Jie Zhou,et al.  CM-Net: A Novel Collaborative Memory Network for Spoken Language Understanding , 2019, EMNLP.

[22]  Francesco Caltagirone,et al.  Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces , 2018, ArXiv.

[23]  Geoffrey Zweig,et al.  Spoken language understanding using long short-term memory neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[24]  Philip S. Yu,et al.  Zero-shot User Intent Detection via Capsule Neural Networks , 2018, EMNLP.

[25]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[26]  Bing Liu,et al.  Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling , 2016, INTERSPEECH.

[27]  Chih-Li Huo,et al.  Slot-Gated Modeling for Joint Slot Filling and Intent Prediction , 2018, NAACL.

[28]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[29]  Ruhi Sarikaya,et al.  Exploiting shared information for multi-intent natural language sentence classification , 2013, INTERSPEECH.