Knowing Where to Leverage: Context-Aware Graph Convolutional Network With an Adaptive Fusion Layer for Contextual Spoken Language Understanding

Spoken language understanding (SLU) systems aim to understand users’ utterance, which is a key component of task-oriented dialogue systems. In this paper, we focus on improving the contextual SLU. The contextual SLU systems mainly focus on how to effectively incorporate dialog context information (contextual information). The existing approaches all use the same contextual information to guide slot filling at all tokens, which may inject the irrelevant information and result in ambiguity. To tackle this problem, we propose a context-aware graph convolutional network (GCN) with an adaptive fusion layer for contextual SLU. The context-aware GCN is proposed to automatically aggregate the contextual information, which frees our model from the manually designed heuristic aggregation function. Meanwhile, an adaptive fusion layer is applied at each token to dynamically incorporate relevant contextual information, which achieves a fine-grained contextual information transfer to guide the token-level slot filling. Experiments on the Simulated Dialog Dataset show that our model achieves state-of-the-art performance and outperforms other previous methods by a large margin (+3.67% on Sim-R, +4.18% on Sim-M and +3.75% on Overall dataset). In addition, we explore and analyze the pre-trained model (i.e., BERT) in our framework. We show that incorporating BERT brings a large improvement in low-resource setting.

[1]  Yangming Li,et al.  Multi-domain Spoken Language Understanding Using Domain- and Task-aware Parameterization , 2020, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[2]  Xiao Xu,et al.  AGIF: An Adaptive Graph-Interactive Framework for Joint Multiple Intent Detection and Slot Filling , 2020, Findings of the Association for Computational Linguistics: EMNLP 2020.

[3]  Jie Zhou,et al.  CM-Net: A Novel Collaborative Memory Network for Spoken Language Understanding , 2019, EMNLP.

[4]  D. Song,et al.  Aspect-based Sentiment Classification with Aspect-specific Graph Convolutional Networks , 2019, Conference on Empirical Methods in Natural Language Processing.

[5]  Yangming Li,et al.  A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding , 2019, EMNLP.

[6]  Kathleen M. Carley,et al.  Syntax-Aware Aspect Level Sentiment Classification with Graph Attention Networks , 2019, EMNLP.

[7]  Meina Song,et al.  A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling , 2019, ACL.

[8]  Wei Lu,et al.  Densely Connected Graph Convolutional Networks for Graph-to-Sequence Learning , 2019, TACL.

[9]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[10]  Liang Li,et al.  A Self-Attentive Model with Gate Mechanism for Spoken Language Understanding , 2018, EMNLP.

[11]  Philip S. Yu,et al.  Zero-shot User Intent Detection via Capsule Neural Networks , 2018, EMNLP.

[12]  Dilek Z. Hakkani-Tür,et al.  An Efficient Approach to Encoding Context for Spoken Language Understanding , 2018, INTERSPEECH.

[13]  Chih-Li Huo,et al.  Slot-Gated Modeling for Joint Slot Filling and Intent Prediction , 2018, NAACL.

[14]  Hongxia Jin,et al.  A Bi-Model Based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling , 2018, NAACL.

[15]  Yun-Nung Chen,et al.  How Time Matters: Learning Time-Decay Attention for Contextual Spoken Language Understanding in Dialogues , 2018, NAACL.

[16]  Andrew McCallum,et al.  Linguistically-Informed Self-Attention for Semantic Role Labeling , 2018, EMNLP.

[17]  Gökhan Tür,et al.  Building a Conversational Agent Overnight with Dialogue Self-Play , 2018, ArXiv.

[18]  Yidong Chen,et al.  Deep Semantic Role Labeling with Self-Attention , 2017, AAAI.

[19]  Tao Shen,et al.  DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.

[20]  Gökhan Tür,et al.  Sequential Dialogue Context Modeling for Spoken Language Understanding , 2017, SIGDIAL Conference.

[21]  Diego Marcheggiani,et al.  Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling , 2017, EMNLP.

[22]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[23]  Gökhan Tür,et al.  End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding , 2016, INTERSPEECH.

[24]  Houfeng Wang,et al.  A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding , 2016, IJCAI.

[25]  Alexander I. Rudnicky,et al.  Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding , 2015, ICMI.

[26]  Yangyang Shi,et al.  Contextual spoken language understanding using recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Geoffrey Zweig,et al.  Spoken language understanding using long short-term memory neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[29]  Ruhi Sarikaya,et al.  Contextual domain classification in spoken language understanding systems using recurrent neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Ruhi Sarikaya,et al.  Convolutional neural network based triangular CRF for joint intent detection and slot filling , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[31]  Dilek Z. Hakkani-Tür,et al.  Easy contextual intent prediction and slot detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Luca Maria Gambardella,et al.  Fast image scanning with deep max-pooling convolutional neural networks , 2013, 2013 IEEE International Conference on Image Processing.

[33]  Bhuvana Ramabhadran,et al.  Deep belief nets for natural language call-routing , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[35]  Giuseppe Riccardi,et al.  Generative and discriminative algorithms for spoken language understanding , 2007, INTERSPEECH.

[36]  Gökhan Tür,et al.  Optimizing SVMs for complex call classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[37]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[38]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.