Recurrent neural network based language model

A new recurrent neural network based language model (RNN LM) with applications to speech recognition is presented. Results indicate that it is possible to obtain around 50% reduction of perplexity by using mixture of several RNN LMs, compared to a state of the art backoff language model. Speech recognition experiments show around 18% reduction of word error rate on the Wall Street Journal task when comparing models trained on the same amount of data, and around 5% on the much harder NIST RT05 task, even when the backoff model is trained on much more data than the RNN LM. We provide ample empirical evidence to suggest that connectionist language models are superior to standard n-gram techniques, except their high computational (training) complexity. Index Terms: language modeling, recurrent neural networks, speech recognition

[1]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[2]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[3]  Matthew V. Mahoney,et al.  Text Compression as a Test for Artificial Intelligence , 1999, AAAI/IAAI.

[4]  Matthew V. Mahoney,et al.  Fast Text Compression with Neural Networks , 2000, FLAIRS Conference.

[5]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[6]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[7]  Mikael Bodén,et al.  A guide to recurrent neural networks and backpropagation , 2001 .

[8]  Jean-Luc Gauvain,et al.  Training Neural Network Language Models on Very Large Corpora , 2005, HLT.

[9]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[10]  Lukás Burget,et al.  The 2005 AMI System for the Transcription of Speech in Meetings , 2005, MLMI.

[11]  Lukás Burget,et al.  The AMI System for the Transcription of Speech in Meetings , 2007, ICASSP.

[12]  Yoshua Bengio,et al.  Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model , 2008, IEEE Transactions on Neural Networks.

[13]  Sanjeev Khudanpur,et al.  Self-supervised discriminative training of statistical language models , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[14]  Lukás Burget,et al.  Neural network based language models for highly inflective languages , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Mary P. Harper,et al.  A Joint Language Model With Fine-grain Syntactic Tags , 2009, EMNLP.