Extensions of recurrent neural network language model

We present several modifications of the original recurrent neural network language model (RNN LM).While this model has been shown to significantly outperform many competitive language modeling techniques in terms of accuracy, the remaining problem is the computational complexity. In this work, we show approaches that lead to more than 15 times speedup for both training and testing phases. Next, we show importance of using a backpropagation through time algorithm. An empirical comparison with feedforward networks is also provided. In the end, we discuss possibilities how to reduce the amount of parameters in the model. The resulting RNN model can thus be smaller, faster both during training and testing, and more accurate than the basic one.

[1]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[2]  D. Rumelhart Learning internal representations by back-propagating errors , 1986 .

[3]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[4]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[5]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[6]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[7]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[8]  Joshua Goodman,et al.  Classes for fast maximum entropy training , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[9]  Mikael Bodén,et al.  A guide to recurrent neural networks and backpropagation , 2001 .

[10]  Ahmad Emami,et al.  Exact training of a neural syntactic language model , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Jean-Luc Gauvain,et al.  Training Neural Network Language Models on Very Large Corpora , 2005, HLT.

[12]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[13]  Ahmad Emami,et al.  A Neural Syntactic Language Model , 2005, Machine Learning.

[14]  Katrin Kirchhoff,et al.  Factored Neural Language Models , 2006, NAACL.

[15]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[16]  Peng Xu,et al.  Random forests and the data sparseness problem in language modeling , 2007, Comput. Speech Lang..

[17]  Yoshua Bengio,et al.  Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model , 2008, IEEE Transactions on Neural Networks.

[18]  Lukás Burget,et al.  Neural network based language models for highly inflective languages , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Mary P. Harper,et al.  A Joint Language Model With Fine-grain Syntactic Tags , 2009, EMNLP.

[20]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.