Discriminative method for recurrent neural network language models

A recurrent neural network language model (RNN-LM) can use a long word context more than can an n-gram language model, and its effective has recently been shown in its accomplishment of automatic speech recognition (ASR) tasks. However, the training criteria of RNN-LM are based on cross entropy (CE) between predicted and reference words. In addition, unlike the discriminative training of acoustic models and discriminative language models (DLM), these criteria do not explicitly consider discriminative criteria calculated from ASR hypotheses and references. This paper proposes a discriminative training method for RNN-LM by additionally considering a discriminative criterion to CE. We use the log-likelihood ratio of the ASR hypotheses and references as an discriminative criterion. The proposed training criterion emphasizes the effect of improperly recognized words relatively compared to the effect of correct words, which are discounted in training. Experiments on a large vocabulary continuous speech recognition task show that our proposed method improves the RNN-LM baseline. In addition, combining the proposed discriminative RNN-LM and DLM further shows its effectiveness.

[1]  Georg Heigold,et al.  Sequence discriminative distributed training of long short-term memory recurrent neural networks , 2014, INTERSPEECH.

[2]  Yuuki Tachioka,et al.  Sequential maximum mutual information linear discriminant analysis for speech recognition , 2014, INTERSPEECH.

[3]  Meng Cai,et al.  Efficient One-Pass Decoding with NNLM for Speech Recognition , 2014, IEEE Signal Processing Letters.

[4]  Lukás Burget,et al.  Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.

[5]  S. Furui,et al.  A JAPANESE NATIONAL PROJECT ON SPONTANEOUS SPEECH CORPUS AND PROCESSING TECHNOLOGY , 2003 .

[6]  Brian Roark,et al.  Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm , 2004, ACL.

[7]  Tomas Mikolov,et al.  RNNLM - Recurrent Neural Network Language Modeling Toolkit , 2011 .

[8]  Hermann Ney,et al.  Comparison of feedforward and recurrent neural network language models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Yuuki Tachioka,et al.  Effectiveness of discriminative training and feature transformation for reverberated and noisy speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Geoffrey Zweig,et al.  Cache based recurrent neural network language model inference for first pass speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Navdeep Jaitly,et al.  Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition , 2012, INTERSPEECH.

[13]  Naoyuki Kanda,et al.  Elastic spectral distortion for low resource speech recognition with deep neural networks , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[14]  Atsushi Nakamura,et al.  Real-time one-pass decoding with recurrent neural network language model for speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Mark Gales,et al.  Structured Discriminative Models For Speech Recognition: An Overview , 2012, IEEE Signal Processing Magazine.

[16]  Atsushi Nakamura,et al.  Large vocabulary continuous speech recognition based on WFST structured classifiers and deep bottleneck features , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[18]  Akinori Ito,et al.  Round-Robin Duel Discriminative Language Models , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Brian Kingsbury,et al.  Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Ethem Alpaydin,et al.  Classification and Ranking Approaches to Discriminative Language Modeling for ASR , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Guangsen Wang,et al.  Sequential Classification Criteria for NNs in Automatic Speech Recognition , 2011, INTERSPEECH.

[22]  Hermann Ney,et al.  rwthlm - the RWTH aachen university neural network language modeling toolkit , 2014, INTERSPEECH.

[23]  Tara N. Sainath,et al.  Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization , 2012, INTERSPEECH.

[24]  Geoffrey Zweig,et al.  Recurrent conditional random field for language understanding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.