Unified Parallel Intent and Slot Prediction with Cross Fusion and Slot Masking

In Automatic Speech Recognition applications, Natural Language Processing (NLP) has sub-tasks of predicting the Intent and Slots for the utterance spoken by the user. Researchers have done a lot of work in this field using Recurrent-Neural-Networks (RNN), Convolution Neural Network (CNN) and attentions based models. However, all of these use either separate independent models for both intent and slot or sequence-to-sequence type networks. They might not take full advantage of relation between intent and slot learning. We are proposing a unified parallel architecture where a CNN Network is used for Intent Prediction and Bidirectional LSTM is used for Slot Prediction. We used Cross Fusion technique to establish relation between Intent and Slot learnings. We also used masking for slot prediction along with cross fusion. Our models surpass existing state-of-the-art results for both Intent as well as Slot prediction on two open datasets.

[1]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Bing Liu,et al.  Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling , 2016, INTERSPEECH.

[4]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[5]  Li Tang,et al.  Attention-Based CNN-BLSTM Networks for Joint Intent Detection and Slot Filling , 2018, CCL.

[6]  Yangyang Shi,et al.  Recurrent Support Vector Machines For Slot Tagging In Spoken Language Understanding , 2016, HLT-NAACL.

[7]  Chih-Li Huo,et al.  Slot-Gated Modeling for Joint Slot Filling and Intent Prediction , 2018, NAACL.

[8]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[9]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[10]  Gökhan Tür,et al.  Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM , 2016, INTERSPEECH.

[11]  Xuanjing Huang,et al.  Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.

[12]  Sungjin Lee,et al.  ONENET: Joint domain, intent, slot prediction for spoken language understanding , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[13]  Bowen Zhou,et al.  Leveraging Sentence-level Information with Encoder LSTM for Semantic Slot Filling , 2016, EMNLP.

[14]  Houfeng Wang,et al.  A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding , 2016, IJCAI.

[15]  Hongxia Jin,et al.  A Bi-Model Based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling , 2018, NAACL.

[16]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[17]  Zhiyuan Liu,et al.  A C-LSTM Neural Network for Text Classification , 2015, ArXiv.

[18]  Giuseppe Riccardi,et al.  Generative and discriminative algorithms for spoken language understanding , 2007, INTERSPEECH.