Comparative Study of Parametric and Representation Uncertainty Modeling for Recurrent Neural Network Language Models

Recurrent neural network language models (RNNLMs) have shown superior performance across a range of tasks, including speech recognition. The hidden layer of RNNLMs plays a vital role in learning the suitable representation of contexts for word prediction. However, the deterministic model parameters and fixed hidden vectors in conventional RNNLMs have limited power in modeling the uncertainty over hidden representations. In order to address this issue, in this paper, a comparative study of parametric and hidden representation uncertainty modeling approaches based on Bayesian gates and variational RNNLMs respectively is investigated on long short-term memory (LSTM) and gated recurrent units (GRU) LMs. Experimental results are presented on two tasks: PennTreebank (PTB) corpus, Switchboard conversational telephone speech (SWBD). Consistent performance improvements were obtained over conventional RNNLMs in terms of both perplexity and word error rate.

[1]  Yongqiang Wang,et al.  Two Efficient Lattice Rescoring Methods Using Recurrent Neural Network Language Models , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  Yu Wang,et al.  Future word contexts in neural network language models , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[3]  Jen-Tzung Chien,et al.  Bayesian Recurrent Neural Network for Language Modeling , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Charles M. Bishop,et al.  Ensemble learning in Bayesian neural networks , 1998 .

[6]  Hermann Ney,et al.  Comparison of feedforward and recurrent neural network language models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Jianwei Yu,et al.  Gaussian Process Neural Networks for Speech Recognition , 2018, INTERSPEECH.

[8]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[9]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[10]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[11]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[12]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[13]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Jianwei Yu,et al.  Gaussian Process Lstm Recurrent Neural Network Language Models for Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[16]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[17]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[18]  Zhiyuan Xu,et al.  Limited-Memory BFGS Optimization of Recurrent Neural Network Language Models for Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[20]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[21]  Kenneth Ward Church,et al.  Approximate inference: A sampling based modeling technique to capture complex dependencies in a language model , 2012, Speech Commun..

[22]  Jen-Tzung Chien,et al.  Variational Recurrent Neural Networks for Speech Separation , 2017, INTERSPEECH.

[23]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[24]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[25]  Alexandre Allauzen,et al.  Structured Output Layer Neural Network Language Models for Speech Recognition , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[27]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[28]  Jianwei Yu,et al.  Bayesian and Gaussian Process Neural Networks for Large Vocabulary Continuous Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Mark J. F. Gales,et al.  Improved neural network based language modelling and adaptation , 2010, INTERSPEECH.

[30]  Zhiting Hu,et al.  Improved Variational Autoencoders for Text Modeling using Dilated Convolutions , 2017, ICML.

[31]  Mark J. F. Gales,et al.  Paraphrastic recurrent neural network language models , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.