Verifying the long-range dependency of RNN language models

It has been argued that recurrent neural network language models are better in capturing long-range dependency than n-gram language models. In this paper, we attempt to verify this claim by investigating the prediction accuracy and the perplexity of these language models as a function of word position, i.e., the position of a word in a sentence. It is expected that as word position increases, the advantage of using recurrent neural network language models over n-gram language models will become more and more evident. On the text corpus of Penn Tree Bank (PTB), a recurrent neural network language model outperforms a trigram language model in both perplexity and word prediction. However, on the AMI meeting corpus, a trigram outperforms a recurrent neural network language model.

[1]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[2]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[3]  Xie Chen,et al.  CUED RNNLM Toolkit , 2015 .

[4]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[5]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[6]  Sanjeev Khudanpur,et al.  Variational approximation of long-span language models for lvcsr , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[9]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[10]  Mark J. F. Gales,et al.  CUED-RNNLM — An open-source toolkit for efficient training and evaluation of recurrent neural network language models , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[12]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[13]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[14]  Kenneth Ward Church,et al.  Approximate inference: A sampling based modeling technique to capture complex dependencies in a language model , 2012, Speech Commun..

[15]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[16]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.