A Deep Learning Based Multi-task Ensemble Model for Intent Detection and Slot Filling in Spoken Language Understanding

An important component of every dialog system is understanding the language popularly known as Spoken Language Understanding (SLU). Intent detection (ID) and slot filling (SF) are the two very important and inter-related tasks of SLU. In this paper, we propose a deep learning based multi-task ensemble model that can perform both intent detection and slot filling tasks together. We use a deep bi-directional recurrent neural network (RNN) with long short term memory (LSTM) and gated recurrent unit (GRU) as the base-level classifiers. A multi-layer perceptron (MLP) framework is used to combine the outputs together. A combined word embedding representation is used to train the model obtained from both Glove and word2vec. This is further augmented with the syntactic Part-of-Speech (PoS) information. On the benchmark ATIS dataset, our experiments show that the proposed ensemble multi-task model (MTM) achieves better results than the individual models and the existing state-of-the-art systems. Experiments on the another dataset, TRAINS also proves that the proposed multi-task ensemble model is more effective compared to the individual models.

[1]  Ruhi Sarikaya,et al.  Convolutional neural network based triangular CRF for joint intent detection and slot filling , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[2]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[3]  Gary Geunbae Lee,et al.  Triangular-Chain Conditional Random Fields , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  P. J. Price,et al.  Evaluation of Spoken Language Systems: the ATIS Domain , 1990, HLT.

[5]  Dilek Z. Hakkani-Tür,et al.  Using Semantic and Syntactic Graphs for Call Classification , 2005, Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing - FeatureEng '05.

[6]  Gökhan Tür,et al.  Optimizing SVMs for complex call classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[7]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[8]  Bhuvana Ramabhadran,et al.  Deep belief nets for natural language call-routing , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Andreas Stolcke,et al.  A comparative study of neural network models for lexical intent classification , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[10]  Pushpak Bhattacharyya,et al.  Intent Detection for Spoken Language Understanding Using a Deep Ensemble Model , 2018, PRICAI.

[11]  Yoshua Bengio,et al.  Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding , 2013, INTERSPEECH.

[12]  Ruhi Sarikaya,et al.  Deep belief network based semantic taggers for spoken language understanding , 2013, INTERSPEECH.

[13]  Gökhan Tür,et al.  What is left to be understood in ATIS? , 2010, 2010 IEEE Spoken Language Technology Workshop.

[14]  Giuseppe Riccardi,et al.  Generative and discriminative algorithms for spoken language understanding , 2007, INTERSPEECH.

[15]  G. Tur,et al.  Model adaptation for spoken language understanding , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[16]  Andreas Stolcke,et al.  Recurrent neural network and LSTM models for lexical utterance classification , 2015, INTERSPEECH.

[17]  James F. Allen,et al.  The TRAINS 93 Dialogues , 1995 .

[18]  Houfeng Wang,et al.  A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding , 2016, IJCAI.

[19]  Gökhan Tür,et al.  Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM , 2016, INTERSPEECH.

[20]  Bing Liu,et al.  Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling , 2016, INTERSPEECH.

[21]  Geoffrey Zweig,et al.  Joint semantic utterance classification and slot filling with recursive neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[22]  Gökhan Tür,et al.  Intent detection using semantically enriched word embeddings , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[23]  Shinji Watanabe,et al.  Efficient learning for spoken language understanding tasks with word embedding based pre-training , 2015, INTERSPEECH.

[24]  Gökhan Tür,et al.  Sentence simplification for spoken language understanding , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Andreas Stolcke,et al.  A comparative study of recurrent neural network models for lexical domain classification , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Bing Liu,et al.  Joint Online Spoken Language Understanding and Language Modeling With Recurrent Neural Networks , 2016, SIGDIAL Conference.

[27]  Giuseppe Riccardi,et al.  How may I help you? , 1997, Speech Commun..

[28]  Geoffrey Zweig,et al.  Recurrent conditional random field for language understanding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).