Intent Detection for Spoken Language Understanding Using a Deep Ensemble Model

One of the significant task in spoken language understanding (SLU) is intent detection. In this paper, we propose a deep learning based ensemble model for intent detection. The outputs of different deep learning architectures such as convolutional neural network (CNN) and variants of recurrent neural networks (RNN) like long short term memory (LSTM) and gated recurrent units (GRU) are combined together using a multi-layer perceptron (MLP). The classifiers are trained using a combined word embedding representation obtained from both Word2Vec and Glove. Our experiments on the benchmark ATIS dataset show state-of-the-art performance for intent detection.

[1]  Gary Geunbae Lee,et al.  Triangular-Chain Conditional Random Fields , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Ruohui Wang,et al.  Edge Detection Using Convolutional Neural Network , 2016, ISNN.

[3]  Gökhan Tür,et al.  Intent detection using semantically enriched word embeddings , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[4]  Ruhi Sarikaya,et al.  Convolutional neural network based triangular CRF for joint intent detection and slot filling , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[5]  P. J. Price,et al.  Evaluation of Spoken Language Systems: the ATIS Domain , 1990, HLT.

[6]  Dilek Z. Hakkani-Tür,et al.  Using Semantic and Syntactic Graphs for Call Classification , 2005, Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing - FeatureEng '05.

[7]  Bhuvana Ramabhadran,et al.  Deep belief nets for natural language call-routing , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Erik F. Tjong Kim Sang,et al.  Noun Phrase Recognition by System Combination , 2000, ANLP.

[9]  Andreas Stolcke,et al.  A comparative study of neural network models for lexical intent classification , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[10]  Andreas Stolcke,et al.  A comparative study of recurrent neural network models for lexical domain classification , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Bing Liu,et al.  Joint Online Spoken Language Understanding and Language Modeling With Recurrent Neural Networks , 2016, SIGDIAL Conference.

[12]  Shinji Watanabe,et al.  Efficient learning for spoken language understanding tasks with word embedding based pre-training , 2015, INTERSPEECH.

[13]  Hans van Halteren,et al.  Improving Data Driven Wordclass Tagging by System Combination , 1998, ACL.

[14]  Ted Pedersen,et al.  A Simple Approach to Building Ensembles of Naive Bayesian Classifiers for Word Sense Disambiguation , 2000, ANLP.

[15]  Keiron O'Shea,et al.  An Introduction to Convolutional Neural Networks , 2015, ArXiv.

[16]  Andreas Stolcke,et al.  Recurrent neural network and LSTM models for lexical utterance classification , 2015, INTERSPEECH.

[17]  Steve Young,et al.  A data-driven spoken language understanding system , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[18]  Houfeng Wang,et al.  A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding , 2016, IJCAI.

[19]  Gökhan Tür,et al.  Sentence simplification for spoken language understanding , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Gökhan Tür,et al.  An active approach to spoken language processing , 2006, TSLP.

[21]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[22]  Giuseppe Riccardi,et al.  How may I help you? , 1997, Speech Commun..

[23]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[26]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[27]  Gökhan Tür,et al.  Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM , 2016, INTERSPEECH.

[28]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[29]  G. Tur,et al.  Model adaptation for spoken language understanding , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[30]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[31]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[32]  Bing Liu,et al.  Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling , 2016, INTERSPEECH.

[33]  Geoffrey Zweig,et al.  Joint semantic utterance classification and slot filling with recursive neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[34]  Gökhan Tür,et al.  Optimizing SVMs for complex call classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[35]  Asif Ekbal,et al.  Weighted Vote-Based Classifier Ensemble for Named Entity Recognition: A Genetic Algorithm-Based Approach , 2011, TALIP.