论文信息 - CUED-RNNLM — An open-source toolkit for efficient training and evaluation of recurrent neural network language models

CUED-RNNLM — An open-source toolkit for efficient training and evaluation of recurrent neural network language models

In recent years, recurrent neural network language models (RNNLMs) have become increasingly popular for a range of applications including speech recognition. However, the training of RNNLMs is computationally expensive, which limits the quantity of data, and size of network, that can be used. In order to fully exploit the power of RNNLMs, efficient training implementations are required. This paper introduces an open-source toolkit, the CUED-RNNLM toolkit, which supports efficient GPU-based training of RNNLMs. RNNLM training with a large number of word level output targets is supported, in contrast to existing tools which used class-based output-targets. Support fotN-best and lattice-based rescoring of both HTK and Kaldi format lattices is included. An example of building and evaluating RNNLMs with this toolkit is presented for a Kaldi based speech recognition system using the AMI corpus. All necessary resources including the source code, documentation and recipe are available online1.

Mark J. F. Gales | X. Chen | Philip C. Woodland | X. Liu | Y. Qian

[1] Alexandre Allauzen,et al. Structured Output Layer Neural Network Language Models for Speech Recognition , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[2] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[4] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5] Yongqiang Wang,et al. Efficient lattice rescoring using recurrent neural network language models , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6] Marie-Francine Moens,et al. A survey on the application of recurrent neural networks to statistical language modeling , 2015, Comput. Speech Lang..

[7] Yee Whye Teh,et al. A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[8] Jean Carletta,et al. The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.

[9] Andreas Stolcke,et al. Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[10] Yangyang Shi,et al. Speed up of recurrent neural network language models with sentence independent subsampling stochastic gradient descent , 2013, INTERSPEECH.

[11] Ebru Arisoy,et al. Converting Neural Network Language Models into Back-off Language Models for Efficient Decoding in Automatic Speech Recognition , 2013, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[13] Mark J. F. Gales,et al. Improved neural network based language modelling and adaptation , 2010, INTERSPEECH.

[14] Holger Schwenk,et al. CSLM - a modular open-source continuous space language modeling toolkit , 2013, INTERSPEECH.

[15] Hermann Ney,et al. rwthlm - the RWTH aachen university neural network language modeling toolkit , 2014, INTERSPEECH.

[16] Mark J. F. Gales,et al. Recurrent neural network language model training with noise contrastive estimation for speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17] Geoffrey Zweig,et al. An introduction to computational networks and the computational network toolkit (invited talk) , 2014, INTERSPEECH.

[18] Kenneth Ward Church,et al. Approximate inference: A sampling based modeling technique to capture complex dependencies in a language model , 2012, Speech Commun..

[19] Mark J. F. Gales,et al. Improving the training and evaluation efficiency of recurrent neural network language models , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[21] Yongqiang Wang,et al. Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch , 2014, INTERSPEECH.

[22] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[23] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[24] Lukás Burget,et al. Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.

[25] Ebru Arisoy,et al. Converting Neural Network Language Models into back-off language models for efficient decoding in automatic speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26] Tomas Mikolov,et al. RNNLM - Recurrent Neural Network Language Modeling Toolkit , 2011 .

[27] Holger Schwenk,et al. Continuous space language models , 2007, Comput. Speech Lang..

[28] Ahmad Emami,et al. Empirical study of neural network language models for Arabic speech recognition , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[29] Hermann Ney,et al. LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[30] Stanley F. Chen,et al. An empirical study of smoothing techniques for language modeling , 1999 .

[31] Geoffrey Zweig,et al. Speed regularization and optimality in word classing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32] Lukás Burget,et al. Recurrent Neural Network Based Language Modeling in Meeting Recognition , 2011, INTERSPEECH.

[33] Yangyang Shi,et al. K-component recurrent neural network language models using curriculum learning , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[34] Mark J. F. Gales,et al. Recurrent neural network language model adaptation for multi-genre broadcast speech recognition , 2015, INTERSPEECH.

[35] Richard M. Schwartz,et al. Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.