A Deep Multi-task Model for Dialogue Act Classification, Intent Detection and Slot Filling

An essential component of any dialogue system is understanding the language which is known as spoken language understanding (SLU). Dialogue act classification (DAC), intent detection (ID) and slot filling (SF) are significant aspects of every dialogue system. In this paper, we propose a deep learning-based multi-task model that can perform DAC, ID and SF tasks together. We use a deep bi-directional recurrent neural network (RNN) with long short-term memory (LSTM) and gated recurrent unit (GRU) as the frameworks in our multi-task model. We use attention on the LSTM/GRU output for DAC and ID. The attention outputs are fed to individual task-specific dense layers for DAC and ID. The output of LSTM/GRU is fed to softmax layer for slot filling as well. Experiments on three datasets, i.e. ATIS, TRAINS and FRAMES, show that our proposed multi-task model performs better than the individual models as well as all the pipeline models. The experimental results prove that our attention-based multi-task model outperforms the state-of-the-art approaches for the SLU tasks. For DAC, in relation to the individual model, we achieve an improvement of more than 2% for all the datasets. Similarly, for ID, we get an improvement of 1% on the ATIS dataset, while for TRAINS and FRAMES dataset, there is a significant improvement of more than 3% compared to individual models. We also get a 0.8% enhancement for ATIS and a 4% enhancement for TRAINS and FRAMES dataset for SF with respect to individual models. Results obtained clearly show that our approach is better than existing methods. The validation of the obtained results is also demonstrated using statistical significance t tests.

[1]  H. W. Zeevat,et al.  A Bayesian Approach to Dialogue Act Classification. BI-DIALOG 2001 , 2001 .

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[4]  Amaury Lendasse,et al.  Generating Word Embeddings from an Extreme Learning Machine for Sentiment Analysis and Sequence Labeling Tasks , 2018, Cognitive Computation.

[5]  David Vilar,et al.  Dialogue act classification using a Bayesian approach ∗ , 2004 .

[6]  Rodney D. Nielsen,et al.  Dialogue Act Classification in Domain-Independent Conversations Using a Deep Recurrent Neural Network , 2016, COLING.

[7]  Yun Lei,et al.  Using Context Information for Dialog Act Classification in DNN Framework , 2017, EMNLP.

[8]  Gökhan Tür,et al.  Intent detection using semantically enriched word embeddings , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[9]  Elizabeth Shriberg,et al.  Automatic dialog act segmentation and classification in multiparty meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[10]  Shinji Watanabe,et al.  Efficient learning for spoken language understanding tasks with word embedding based pre-training , 2015, INTERSPEECH.

[11]  Ruhi Sarikaya,et al.  Deep belief network based semantic taggers for spoken language understanding , 2013, INTERSPEECH.

[12]  Yoshua Bengio,et al.  Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding , 2013, INTERSPEECH.

[13]  Homa B. Hashemi,et al.  Query Intent Detection using Convolutional Neural Networks , 2016 .

[14]  Stefan Ultes,et al.  Exploiting Sentence and Context Representations in Deep Neural Models for Spoken Language Understanding , 2016, COLING.

[15]  Yang Liu Using SVM and error-correcting codes for multiclass dialog act classification in meeting corpus , 2006, INTERSPEECH.

[16]  Lin Zhao,et al.  Improving Slot Filling in Spoken Language Understanding with Joint Pointer and Attention , 2018, ACL.

[17]  Gholamreza Haffari,et al.  A Latent Variable Recurrent Neural Network for Discourse Relation Language Models , 2016 .

[18]  Ruhi Sarikaya,et al.  Convolutional neural network based triangular CRF for joint intent detection and slot filling , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[19]  Pavel Král,et al.  Automatic dialogue act recognition with syntactic features , 2014, Language Resources and Evaluation.

[20]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[21]  Alexandros Potamianos,et al.  Dialogue Act Semantic Representation and Classification Using Recurrent Neural Networks , 2017 .

[22]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[23]  Simon Keizer,et al.  A Bayesian Approach to Dialogue Act Classication , 2001 .

[24]  Gökhan Tür,et al.  Optimizing SVMs for complex call classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[25]  Gökhan Tür,et al.  Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM , 2016, INTERSPEECH.

[26]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[27]  Tomi Kinnunen,et al.  INTERSPEECH 2013 14thAnnual Conference of the International Speech Communication Association , 2013, Interspeech 2015.

[28]  Anton Nijholt,et al.  Dialogue Act Recognition with Bayesian Networks for Dutch Dialogues , 2002, SIGDIAL Workshop.

[29]  Geoffrey Zweig,et al.  Spoken language understanding using long short-term memory neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[30]  Geoffrey Zweig,et al.  Recurrent conditional random field for language understanding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Bing Liu,et al.  Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling , 2016, INTERSPEECH.

[32]  Hongfei Lin,et al.  Improving User Attribute Classification with Text and Social Network Attention , 2019, Cognitive Computation.

[33]  Geoffrey Zweig,et al.  Joint semantic utterance classification and slot filling with recursive neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[34]  Barbara Di Eugenio,et al.  Multimodality and Dialogue Act Classification in the RoboHelper Project , 2013, SIGDIAL Conference.

[35]  Wei-Ying Ma,et al.  Topic Aware Neural Response Generation , 2016, AAAI.

[36]  G. Tur,et al.  Model adaptation for spoken language understanding , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[37]  Ruohui Wang,et al.  Edge Detection Using Convolutional Neural Network , 2016, ISNN.

[38]  Qinghua Hu,et al.  Combining heterogeneous deep neural networks with conditional random fields for Chinese dialogue act recognition , 2015, Neurocomputing.

[39]  Chih-Li Huo,et al.  Slot-Gated Modeling for Joint Slot Filling and Intent Prediction , 2018, NAACL.

[40]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42]  Andreas Stolcke,et al.  A comparative study of neural network models for lexical intent classification , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[43]  Geoffrey Zweig,et al.  Recurrent neural networks for language understanding , 2013, INTERSPEECH.

[44]  Gary Geunbae Lee,et al.  Triangular-Chain Conditional Random Fields , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[45]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[46]  Xiaoyan Zhu,et al.  Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory , 2017, AAAI.

[47]  Xiao Sun,et al.  Emotional Human-Machine Conversation Generation Based on Long Short-Term Memory , 2017, Cognitive Computation.

[48]  Steve Young,et al.  A data-driven spoken language understanding system , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[49]  Harshit Kumar,et al.  Dialogue Act Sequence Labeling using Hierarchical encoder with CRF , 2017, AAAI.

[50]  Hua Han,et al.  Sequentially Supervised Long Short-Term Memory for Gesture Recognition , 2016, Cognitive Computation.

[51]  Giuseppe Riccardi,et al.  Generative and discriminative algorithms for spoken language understanding , 2007, INTERSPEECH.

[52]  Pushpak Bhattacharyya,et al.  A Deep Learning Based Multi-task Ensemble Model for Intent Detection and Slot Filling in Spoken Language Understanding , 2018, ICONIP.

[53]  Alessandro Moschitti,et al.  Spoken language understanding with kernels for syntactic/semantic structures , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[54]  Ricardo Ribeiro,et al.  The Influence of Context on Dialogue Act Recognition , 2015, ArXiv.

[55]  Giuseppe Riccardi,et al.  How may I help you? , 1997, Speech Commun..

[56]  Arthur C. Graesser,et al.  Context-Based Speech Act Classification in Intelligent Tutoring Systems , 2014, Intelligent Tutoring Systems.

[57]  P. J. Price,et al.  Evaluation of Spoken Language Systems: the ATIS Domain , 1990, HLT.

[58]  Phil Blunsom,et al.  Recurrent Convolutional Neural Networks for Discourse Compositionality , 2013, CVSM@ACL.

[59]  Andreas Stolcke,et al.  Training a prosody-based dialog act tagger from unlabeled data , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[60]  Hongxia Jin,et al.  A Bi-Model Based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling , 2018, NAACL.

[61]  Pushpak Bhattacharyya,et al.  A Multi-Task Hierarchical Approach for Intent Detection and Slot Filling , 2019, Knowl. Based Syst..

[62]  Geoffrey Zweig,et al.  Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[63]  Nathan Schneider,et al.  Association for Computational Linguistics: Human Language Technologies , 2011 .

[64]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[65]  Sungjin Lee,et al.  ONENET: Joint domain, intent, slot prediction for spoken language understanding , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[66]  Andreas Stolcke,et al.  Recurrent neural network and LSTM models for lexical utterance classification , 2015, INTERSPEECH.

[67]  Houfeng Wang,et al.  A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding , 2016, IJCAI.

[68]  Timothy Baldwin,et al.  Classifying Dialogue Acts in One-on-One Live Chats , 2010, EMNLP.

[69]  Gholamreza Haffari,et al.  A Latent Variable Recurrent Neural Network for Discourse Relation Language Models , 2016, ArXiv.

[70]  Gökhan Tür,et al.  Sentence simplification for spoken language understanding , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[71]  Kai Yu,et al.  Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[72]  M. Inés Torres,et al.  Detection of Sarcasm and Nastiness: New Resources for Spanish Language , 2018, Cognitive Computation.

[73]  Pushpak Bhattacharyya,et al.  Intent Detection for Spoken Language Understanding Using a Deep Ensemble Model , 2018, PRICAI.

[74]  Rosalind W. Picard,et al.  Dialog Act Classification from Prosodic Features Using Support Vector Machines , 2002 .

[75]  Jeff A. Bilmes,et al.  Dialog act tagging using graphical models , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[76]  Zhaoxia Wang,et al.  Optimal Feature Selection for Learning-Based Algorithms for Sentiment Classification , 2019, Cognitive Computation.

[77]  Bing Liu,et al.  Dialog context language modeling with recurrent neural networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[78]  Bing Liu,et al.  Joint Online Spoken Language Understanding and Language Modeling With Recurrent Neural Networks , 2016, SIGDIAL Conference.

[79]  Dilek Z. Hakkani-Tür,et al.  Using Semantic and Syntactic Graphs for Call Classification , 2005, Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing - FeatureEng '05.

[80]  Klaus Ries,et al.  HMM and neural network based speech act detection , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[81]  Gökhan Tür,et al.  Sequential Dialogue Context Modeling for Spoken Language Understanding , 2017, SIGDIAL Conference.