A survey of joint intent detection and slot-filling models in natural language understanding

Intent classification and slot filling are two critical tasks for natural language understanding. Traditionally the two tasks have been deemed to proceed independently. However, more recently, joint models for intent classification and slot filling have achieved state-of-the-art performance, and have proved that there exists a strong relationship between the two tasks. This article is a compilation of past work in natural language understanding, especially joint intent classification and slot filling. We observe three milestones in this research so far: Intent detection to identify the speaker’s intention, slot filling to label each word token in the speech/text, and finally, joint intent classification and slot filling tasks. In this article, we describe trends, approaches, issues, data sets, evaluation metrics in intent classification and slot filling. We also discuss representative performance values, describe shared tasks, and provide pointers to future work, as given in prior works. To interpret the state-of-the-art trends, we provide multiple tables that describe and summarise past research along different dimensions, including the types of features, base approaches, and dataset domain used.

[1]  Ryuichiro Higashinaka,et al.  Adversarial Training for Multi-task and Multi-lingual Joint Modeling of Utterance Intent Classification , 2018, EMNLP.

[2]  Judith Gaspers,et al.  Cross-lingual Transfer Learning for Spoken Language Understanding , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Giuseppe Riccardi,et al.  Generative and discriminative algorithms for spoken language understanding , 2007, INTERSPEECH.

[4]  Xinhui Tu,et al.  Effective Utilization of External Knowledge and History Context in Multi-turn Spoken Language Understanding Model , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[5]  Pushpak Bhattacharyya,et al.  A Deep Learning Based Multi-task Ensemble Model for Intent Detection and Slot Filling in Spoken Language Understanding , 2018, ICONIP.

[6]  Wen Wang,et al.  BERT for Joint Intent Classification and Slot Filling , 2019, ArXiv.

[7]  Lei Shen,et al.  ACJIS: A Novel Attentive Cross Approach For Joint Intent Detection And Slot Filling , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[8]  Feiliang Ren,et al.  A Multiple Utterances based Neural Network Model for Joint Intent Detection and Slot Filling , 2018, CCKS Tasks.

[9]  Anmol Bhasin,et al.  Unified Parallel Intent and Slot Prediction with Cross Fusion and Slot Masking , 2019, NLDB.

[10]  Meina Song,et al.  A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling , 2019, ACL.

[11]  Hong Chen,et al.  Jointly Modeling Intent Identification and Slot Filling with Contextual and Hierarchical Information , 2017, NLPCC.

[12]  Markus Dreyer,et al.  Multi-Task Networks with Universe, Group, and Task Feature Learning , 2019, ACL.

[13]  Asif Ekbal,et al.  A Deep Multi-task Model for Dialogue Act Classification, Intent Detection and Slot Filling , 2020, Cognitive Computation.

[14]  Zaixing He,et al.  A Novel Slot-Gated Model Combined With a Key Verb Context Feature for Task Request Understanding by Service Robots , 2019, IEEE Access.

[15]  Andreas Stolcke,et al.  Recurrent neural network and LSTM models for lexical utterance classification , 2015, INTERSPEECH.

[16]  Houfeng Wang,et al.  A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding , 2016, IJCAI.

[17]  Gökhan Tür,et al.  Use of kernel deep convex networks and end-to-end learning for spoken language understanding , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[18]  Lirong Qiu,et al.  Query Intent Recognition Based on Multi-Class Features , 2018, IEEE Access.

[19]  Philip S. Yu,et al.  Joint Slot Filling and Intent Detection via Capsule Neural Networks , 2018, ACL.

[20]  Meina Song,et al.  A Joint Model based on CNN-LSTMs in Dialogue Understanding , 2018, 2018 International Conference on Information Systems and Computer Aided Education (ICISCAE).

[21]  Li Tang,et al.  Attention-Based CNN-BLSTM Networks for Joint Intent Detection and Slot Filling , 2018, CCL.

[22]  Pushpak Bhattacharyya,et al.  A Multi-Task Hierarchical Approach for Intent Detection and Slot Filling , 2019, Knowl. Based Syst..

[23]  Gerald Penn,et al.  Rationally Reappraising ATIS-based Dialogue Systems , 2019, ACL.

[24]  Rashmi Gangadharaiah,et al.  Joint Multiple Intent Detection and Slot Labeling for Goal-Oriented Dialog , 2019, NAACL.

[25]  Wanxiang Che,et al.  CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP , 2020, International Joint Conference on Artificial Intelligence.

[26]  Yangyang Shi,et al.  RNN-based labeled data generation for spoken language understanding , 2015, INTERSPEECH.

[27]  Sanjika Hewavitharana,et al.  Deep Neural Architecture with Character Embedding for Semantic Frame Detection , 2019, 2019 IEEE 13th International Conference on Semantic Computing (ICSC).

[28]  Jinsik Lee,et al.  Learning to Embed Semantic Correspondence for Natural Language Understanding , 2018, CoNLL.

[29]  Gökhan Tür,et al.  Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM , 2016, INTERSPEECH.

[30]  Geoffrey Zweig,et al.  Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[31]  Bing Liu,et al.  Recurrent Neural Network Structured Output Prediction for Spoken Language Understanding , 2015 .

[32]  Xinhui Tu,et al.  Dialogue intent classification with character-CNN-BGRU networks , 2019, Multimedia Tools and Applications.

[33]  Franck Dernoncourt,et al.  Improving Slot Filling by Utilizing Contextual Information , 2020, NLP4CONVAI.

[34]  John H. L. Hansen,et al.  Intent detection and semantic parsing for navigation dialogue language processing , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[35]  Tianyong Hao,et al.  A Feature-Enriched Method for User Intent Classification by Leveraging Semantic Tag Expansion , 2018, NLPCC.

[36]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[37]  Ngoc Thang Vu Sequential Convolutional Neural Networks for Slot Filling in Spoken Language Understanding , 2016, INTERSPEECH.

[38]  武田 一哉,et al.  Recurrent Neural Networkに基づく日常生活行動認識 , 2016 .

[39]  Young-Bum Kim,et al.  Coupled Representation Learning for Domains, Intents and Slots in Spoken Language Understanding , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[40]  Min Lin,et al.  Review of Intent Detection Methods in the Human-Machine Dialogue System , 2019, Journal of Physics: Conference Series.

[41]  B. Lindblom Multi-Faceted Approach , 2006 .

[42]  Jie Zhou,et al.  CM-Net: A Novel Collaborative Memory Network for Spoken Language Understanding , 2019, EMNLP.

[43]  Bhuvana Ramabhadran,et al.  Deep belief nets for natural language call-routing , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[44]  Fuji Ren,et al.  Intention Detection Based on Siamese Neural Network With Triplet Loss , 2020, IEEE Access.

[45]  Hongxia Jin,et al.  Iterative Delexicalization for Improved Spoken Language Understanding , 2019, INTERSPEECH.

[46]  Uthayasanker Thayasivam,et al.  Meta Learning for Few-Shot Joint Intent Detection and Slot-Filling , 2020, ICML 2020.

[47]  Gökhan Tür,et al.  Syntax or semantics? knowledge-guided joint semantic frame parsing , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[48]  Gary Geunbae Lee,et al.  Triangular-Chain Conditional Random Fields , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[49]  Wolfgang Nejdl,et al.  Learning to Detect Event-Related Queries for Web Search , 2015, WWW.

[50]  Rahul Jha,et al.  Slot Tagging for Task Oriented Spoken Language Understanding in Human-to-Human Conversation Scenarios , 2019, CoNLL.

[51]  Liang Li,et al.  A Self-Attentive Model with Gate Mechanism for Spoken Language Understanding , 2018, EMNLP.

[52]  Ignacio Iacobacci,et al.  Auxiliary Capsules for Natural Language Understanding , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[53]  Min Lin,et al.  Review of Research on Task-Oriented Spoken Language Understanding , 2019 .

[54]  Zheng Fang,et al.  Attention-Based RNN Model for Joint Extraction of Intent and Word Slot Based on a Tagging Strategy , 2018, ICANN.

[55]  Anmol Bhasin,et al.  Parallel Intent and Slot Prediction using MLB Fusion , 2020, 2020 IEEE 14th International Conference on Semantic Computing (ICSC).

[56]  Ruhi Sarikaya,et al.  Convolutional neural network based triangular CRF for joint intent detection and slot filling , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[57]  Dilek Z. Hakkani-Tür,et al.  A Joint Model for Discovery of Aspects in Utterances , 2012, ACL.

[58]  Geoffrey Zweig,et al.  Spoken language understanding using long short-term memory neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[59]  Sang-goo Lee,et al.  Slot Filling with Delexicalized Sentence Generation , 2018, INTERSPEECH.

[60]  Hongxia Jin,et al.  A New Concept of Multiple Neural Networks Structure Using Convex Combination , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[61]  Jin Zeng,et al.  A Self-Attention Joint Model for Spoken Language Understanding in Situational Dialog Applications , 2019, ArXiv.

[62]  Kai Yu,et al.  Prior Knowledge Driven Label Embedding for Slot Filling in Natural Language Understanding , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[63]  Philip S. Yu,et al.  Zero-shot User Intent Detection via Capsule Neural Networks , 2018, EMNLP.

[64]  Mark Levene,et al.  Understanding user intent in community question answering , 2012, WWW.

[65]  Philip S. Yu,et al.  Mining User Intentions from Medical Queries: A Neural Network Based Heterogeneous Jointly Modeling Approach , 2016, WWW.

[66]  Geoffrey Zweig,et al.  Recurrent conditional random field for language understanding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[67]  Geoffrey E. Hinton Deep Belief Nets , 2017, Encyclopedia of Machine Learning and Data Mining.

[68]  Bing Liu,et al.  Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling , 2016, INTERSPEECH.

[69]  Dong Yu,et al.  Sequential Labeling Using Deep-Structured Conditional Random Fields , 2010, IEEE Journal of Selected Topics in Signal Processing.

[70]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[71]  Eyup Halit Yilmaz,et al.  KLOOS: KL Divergence-based Out-of-Scope Intent Detection in Human-to-Machine Conversations , 2020, SIGIR.

[72]  Geoffrey Zweig,et al.  Joint semantic utterance classification and slot filling with recursive neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[73]  Hao Huang,et al.  Using Deep Time Delay Neural Network for Slot Filling in Spoken Language Understanding , 2020, Symmetry.

[74]  Yi Zhang,et al.  Learning to Classify Intents and Slot Labels Given a Handful of Examples , 2020, NLP4CONVAI.

[75]  Zhichang Zhang,et al.  A Joint Learning Framework With BERT for Spoken Language Understanding , 2019, IEEE Access.

[76]  Ruhi Sarikaya,et al.  Deep belief network based semantic taggers for spoken language understanding , 2013, INTERSPEECH.

[77]  Wenhu Chen,et al.  Interpreting and Improving Deep Neural SLU Models via Vocabulary Importance , 2019, INTERSPEECH.

[78]  Alexander I. Rudnicky,et al.  Expanding the Scope of the ATIS Task: The ATIS-3 Corpus , 1994, HLT.

[79]  George R. Doddington,et al.  The ATIS Spoken Language Systems Pilot Corpus , 1990, HLT.

[80]  Alex Acero,et al.  Spoken Language Understanding "” An Introduction to the Statistical Framework , 2005 .

[81]  Lama Nachman,et al.  Natural Language Interactions in Autonomous Vehicles: Intent Detection and Slot Filling from Passenger Utterances , 2019, CICLing.

[82]  Qing Li,et al.  A model with length-variable attention for spoken language understanding , 2020, Neurocomputing.

[83]  Dilek Z. Hakkani-Tür,et al.  End-to-end joint learning of natural language understanding and dialogue manager , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[84]  Hongxia Jin,et al.  A Progressive Model to Enable Continual Learning for Semantic Slot Filling , 2019, EMNLP.

[85]  Xiaodong Zhang,et al.  Graph LSTM with Context-Gated Mechanism for Spoken Language Understanding , 2020, AAAI.

[86]  Bernardo Magnini,et al.  Leveraging Non-Conversational Tasks for Low Resource Slot Filling: Does it help? , 2019, SIGdial.

[87]  Angeliki Metallinou,et al.  Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents , 2018, AAAI.

[88]  Waleed Ammar,et al.  Structural Scaffolds for Citation Intent Classification in Scientific Publications , 2019, NAACL.

[89]  Peng Zhang,et al.  CASA-NLU: Context-Aware Self-Attentive Natural Language Understanding for Task-Oriented Chatbots , 2019, EMNLP/IJCNLP.

[90]  Bowen Zhou,et al.  Jointly Trained Sequential Labeling and Classification by Sparse Attention Neural Networks , 2017, INTERSPEECH.

[91]  Sungjin Lee,et al.  ONENET: Joint domain, intent, slot prediction for spoken language understanding , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[92]  Shuai Yu,et al.  WAIS: Word Attention for Joint Intent Detection and Slot Filling , 2019, AAAI.

[93]  Marcus Liwicki,et al.  Subword Semantic Hashing for Intent Classification on Small Datasets , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[94]  Lin Zhao,et al.  Improving Slot Filling in Spoken Language Understanding with Joint Pointer and Attention , 2018, ACL.

[95]  Ruhi Sarikaya,et al.  Exploiting shared information for multi-intent natural language sentence classification , 2013, INTERSPEECH.

[96]  Frédéric Béchet,et al.  Is ATIS Too Shallow to Go Deeper for Benchmarking Spoken Language Understanding Models? , 2018, INTERSPEECH.

[97]  Zhen Huang,et al.  SASGBC: Improving Sequence Labeling Performance for Joint Learning of Slot Filling and Intent Detection , 2020, ICCDE.

[98]  Caixia Yuan,et al.  Recent Advances on Human-Computer Dialogue , 2016, CAAI Trans. Intell. Technol..

[99]  Gökhan Tür,et al.  Sentence simplification for spoken language understanding , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[100]  Bowen Zhou,et al.  Leveraging Sentence-level Information with Encoder LSTM for Semantic Slot Filling , 2016, EMNLP.

[101]  Kaisheng Yao,et al.  Recurrent Neural Networks with External Memory for Language Understanding , 2015, ArXiv.

[102]  Hongxia Jin,et al.  A Bi-Model Based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling , 2018, NAACL.

[103]  Lynette Hirschman,et al.  Multi-Site Data Collection for a Spoken Language Corpus , 1992, HLT.

[104]  Arantxa Otegi,et al.  Survey on evaluation methods for dialogue , 2019 .

[105]  Xiao Xu,et al.  AGIF: An Adaptive Graph-Interactive Framework for Joint Multiple Intent Detection and Slot Filling , 2020, Findings of the Association for Computational Linguistics: EMNLP 2020.

[106]  Peng Xu,et al.  Zero-shot Cross-lingual Dialogue Systems with Transferable Latent Variables , 2019, EMNLP.

[107]  Yangyang Shi,et al.  Contextual spoken language understanding using recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[108]  Houfeng Wang,et al.  Using Bidirectional Transformer-CRF for Spoken Language Understanding , 2019, NLPCC.

[109]  Giuseppe Castellucci,et al.  Multi-lingual Intent Detection and Slot Filling in a Joint BERT-based Model , 2019, ArXiv.

[110]  Dilek Z. Hakkani-Tür,et al.  Easy contextual intent prediction and slot detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[111]  Alaa Mohasseb,et al.  Classification of factoid questions intent using grammatical features , 2018, ICT Express.

[112]  Young-Bum Kim,et al.  Joint Learning of Domain Classification and Out-of-Domain Detection with Dynamic Class Weighting for Satisficing False Acceptance Rates , 2018, INTERSPEECH.

[113]  Gangmin Li,et al.  Natural language understanding approaches based on joint task of intent detection and slot filling for IoT voice interaction , 2020, Neural Computing and Applications.

[114]  Ricardo Baeza-Yates,et al.  A Multi-faceted Approach to Query Intent Classification , 2011, SPIRE.

[115]  Sriparna Saha,et al.  Understanding Temporal Query Intent , 2015, SIGIR.

[116]  Xiao Xu,et al.  TD-GIN: Token-level Dynamic Graph-Interactive Network for Joint Multiple Intent Detection and Slot Filling , 2020, ArXiv.

[117]  Witold Pedrycz,et al.  A Deep Learning Model with Data Enrichment for Intent Detection and Slot Filling , 2019, 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC).

[118]  Yue Wang,et al.  A Hierarchical LSTM Model for Joint Tasks , 2016, CCL.

[119]  Ruixi Lin,et al.  Multi-Layer Ensembling Techniques for Multilingual Intent Classification , 2018, ArXiv.

[120]  Jonathan G. Fiscus,et al.  DARPA February 1992 ATIS Benchmark Test Results , 1992, HLT.

[121]  Hongxia Jin,et al.  A New Concept of Deep Reinforcement Learning based Augmented General Tagging System , 2018, COLING.

[122]  Hua Xu,et al.  Deep Unknown Intent Detection with Margin Loss , 2019, ACL.

[123]  Gökhan Tür,et al.  What is left to be understood in ATIS? , 2010, 2010 IEEE Spoken Language Technology Workshop.

[124]  Amit P. Sheth,et al.  Intent Classification of Short-Text on Social Media , 2015, 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity).

[125]  Chih-Li Huo,et al.  Slot-Gated Modeling for Joint Slot Filling and Intent Prediction , 2018, NAACL.

[126]  James R. Glass,et al.  A Comparison of Deep Learning Methods for Language Understanding , 2019, INTERSPEECH.

[127]  Steve Young,et al.  A data-driven spoken language understanding system , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[128]  Dong Yu,et al.  Conditional Joint Model for Spoken Dialogue System , 2019, ICCC.

[129]  Bing Liu,et al.  Joint Online Spoken Language Understanding and Language Modeling With Recurrent Neural Networks , 2016, SIGDIAL Conference.

[130]  Hongxia Jin,et al.  User Information Augmented Semantic Frame Parsing Using Progressive Neural Networks , 2018, INTERSPEECH.

[131]  Muhua Zhu,et al.  Deep Cascade Multi-Task Learning for Slot Filling in Online Shopping Assistant , 2018, AAAI.

[132]  Kai Yu,et al.  Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[133]  Pushpak Bhattacharyya,et al.  Intent Detection for Spoken Language Understanding Using a Deep Ensemble Model , 2018, PRICAI.

[134]  Ye-Yi Wang,et al.  Strategies for statistical spoken language understanding with small amount of data - an empirical study , 2010, INTERSPEECH.

[135]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[136]  Bernardo Magnini,et al.  Exploring Named Entity Recognition As an Auxiliary Task for Slot Filling in Conversational Language Understanding , 2018, SCAI@EMNLP.

[137]  Arantxa Otegi,et al.  Survey on evaluation methods for dialogue systems , 2019, Artificial Intelligence Review.

[138]  Homa B. Hashemi,et al.  Query Intent Detection using Convolutional Neural Networks , 2016 .

[139]  Giuseppe Castellucci,et al.  Almawave-SLU: A New Dataset for SLU in Italian , 2019, CLiC-it.

[140]  Dilek Z. Hakkani-Tür,et al.  Deep Learning in Conversational Language Understanding , 2018 .

[141]  Ngoc Thang Vu,et al.  Bi-directional recurrent neural network with ranking loss for spoken language understanding , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[142]  Yan Zhao,et al.  A Joint Multi-Task Learning Framework for Spoken Language Understanding , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[143]  Joelle Pineau,et al.  A Survey of Available Corpora for Building Data-Driven Dialogue Systems , 2015, Dialogue Discourse.

[144]  Yoshua Bengio,et al.  Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding , 2013, INTERSPEECH.

[145]  Hongxia Jin,et al.  Robust Spoken Language Understanding via Paraphrasing , 2018, INTERSPEECH.

[146]  Katrin Kirchhoff,et al.  Simple, Fast, Accurate Intent Classification and Slot Labeling for Goal-Oriented Dialogue Systems , 2019, SIGdial.

[147]  Zhijian Ou,et al.  Elastic CRFs for Open-ontology Slot Filling , 2018, ArXiv.

[148]  Geoffrey Zweig,et al.  Recurrent neural networks for language understanding , 2013, INTERSPEECH.

[149]  Francesco Caltagirone,et al.  Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces , 2018, ArXiv.

[150]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[151]  BengioYoshua,et al.  Using recurrent neural networks for slot filling in spoken language understanding , 2015 .

[152]  Sebastian Schuster,et al.  Cross-lingual Transfer Learning for Multilingual Task Oriented Dialog , 2018, NAACL.

[153]  Hao Tang,et al.  End-to-end masked graph-based CRF for joint slot filling and intent detection , 2020, Neurocomputing.