Variational Inference-Based Dropout in Recurrent Neural Networks for Slot Filling in Spoken Language Understanding

This paper proposes to generalize the variational recurrent neural network (RNN) with variational inference (VI)-based dropout regularization employed for the long short-term memory (LSTM) cells to more advanced RNN architectures like gated recurrent unit (GRU) and bi-directional LSTM/GRU. The new variational RNNs are employed for slot filling, which is an intriguing but challenging task in spoken language understanding. The experiments on the ATIS dataset suggest that the variational RNNs with the VI-based dropout regularization can significantly improve the naive dropout regularization RNNs-based baseline systems in terms of F-measure. Particularly, the variational RNN with bi-directional LSTM/GRU obtains the best F-measure score.

[1]  Chin-Hui Lee,et al.  Tensor-To-Vector Regression for Multi-Channel Speech Enhancement Based on Tensor-Train Network , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Jeff A. Bilmes,et al.  On Deep Multi-View Representation Learning , 2015, ICML.

[3]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[4]  Dong Wang,et al.  Bottleneck features based on gammatone frequency cepstral coefficients , 2013, INTERSPEECH.

[5]  Maneesh Sahani,et al.  Regularization and nonlinearities for neural language models: when are they needed? , 2013, ArXiv.

[6]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[7]  Jun Du,et al.  A Theory on Deep Neural Network Based Vector-to-Vector Regression With an Illustration of Its Expressive Power in Speech Enhancement , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  Xu Liu,et al.  Distributed Submodular Maximization for Large Vocabulary Continuous Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Jun Qi,et al.  Deep multi-view representation learning for multi-modal features of the schizophrenia and schizo-affective disorder , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Gökhan Tür,et al.  Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM , 2016, INTERSPEECH.

[11]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[12]  Chao-Han Huck Yang,et al.  Submodular Rank Aggregation on Score-Based Permutations for Distributed Automatic Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Tara N. Sainath,et al.  Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Bing Liu,et al.  Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling , 2016, INTERSPEECH.

[15]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[16]  Yoshua Bengio,et al.  Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding , 2013, INTERSPEECH.

[17]  Chin-Hui Lee,et al.  Analyzing Upper Bounds on Mean Absolute Errors for Deep Neural Network-Based Vector-to-Vector Regression , 2020, IEEE Transactions on Signal Processing.

[18]  Geoffrey Zweig,et al.  Joint semantic utterance classification and slot filling with recursive neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[19]  Chin-Hui Lee,et al.  On Mean Absolute Error for Deep Neural Network Based Vector-to-Vector Regression , 2020, IEEE Signal Processing Letters.

[20]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[21]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.