A Bi-Model Based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling

Intent detection and slot filling are two main tasks for building a spoken language understanding(SLU) system. Multiple deep learning based models have demonstrated good results on these tasks . The most effective algorithms are based on the structures of sequence to sequence models (or “encoder-decoder” models), and generate the intents and semantic tags either using separate models. Most of the previous studies, however, either treat the intent detection and slot filling as two separate parallel tasks, or use a sequence to sequence model to generate both semantic tags and intent. None of the approaches consider the cross-impact between the intent detection task and the slot filling task. In this paper, new Bi-model based RNN semantic frame parsing network structures are designed to perform the intent detection and slot filling tasks jointly, by considering their cross-impact to each other using two correlated bidirectional LSTMs (BLSTM). Our Bi-model structure with a decoder achieves state-of-art result on the benchmark ATIS data, with about 0.5% intent accuracy improvement and 0.9 % slot filling improvement.

[1]  Ruhi Sarikaya,et al.  Contextual domain classification in spoken language understanding systems using recurrent neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Gökhan Tür,et al.  What is left to be understood in ATIS? , 2010, 2010 IEEE Spoken Language Technology Workshop.

[3]  Gökhan Tür,et al.  Optimizing SVMs for complex call classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Wei Chen,et al.  Extension of second level adaptation using multiple models to SISO systems , 2015, 2015 American Control Conference (ACC).

[6]  Gökhan Tür,et al.  Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM , 2016, INTERSPEECH.

[7]  Geoffrey Zweig,et al.  Spoken language understanding using long short-term memory neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[8]  Yu Wang,et al.  A new concept using LSTM Neural Networks for dynamic system identification , 2017, 2017 American Control Conference (ACC).

[9]  Houfeng Wang,et al.  A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding , 2016, IJCAI.

[10]  Hermann Ney,et al.  Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Hongxia Jin,et al.  A Boosting-based Deep Neural Networks Algorithm for Reinforcement Learning , 2018, 2018 Annual American Control Conference (ACC).

[12]  Bing Liu,et al.  Recurrent Neural Network Structured Output Prediction for Spoken Language Understanding , 2015 .

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[15]  Geoffrey Zweig,et al.  Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[17]  Ruhi Sarikaya,et al.  Convolutional neural network based triangular CRF for joint intent detection and slot filling , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[18]  Kam-Fai Wong,et al.  Recurrent Neural Networks with External Memory for Spoken Language Understanding , 2015, NLPCC.

[19]  Bing Liu,et al.  Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling , 2016, INTERSPEECH.

[20]  Geoffrey Zweig,et al.  Joint semantic utterance classification and slot filling with recursive neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[21]  George R. Doddington,et al.  The ATIS Spoken Language Systems Pilot Corpus , 1990, HLT.

[22]  Wei Chen,et al.  Stability, robustness, and performance issues in second level adaptation , 2014, 2014 American Control Conference.

[23]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[24]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[25]  Bhuvana Ramabhadran,et al.  Deep belief nets for natural language call-routing , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Bing Liu,et al.  Joint Online Spoken Language Understanding and Language Modeling With Recurrent Neural Networks , 2016, SIGDIAL Conference.

[27]  Bowen Zhou,et al.  Leveraging Sentence-level Information with Encoder LSTM for Semantic Slot Filling , 2016, EMNLP.

[28]  Kaisheng Yao,et al.  Recurrent Neural Networks with External Memory for Language Understanding , 2015, ArXiv.

[29]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[30]  Snehasis Mukhopadhyay,et al.  Fast Reinforcement Learning using multiple models , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).