Acoustically discriminative language model training with pseudo-hypothesis

Recently proposed methods for discriminative language modeling require alternate hypotheses in the form of lattices or N-best lists. These are usually generated by an Automatic Speech Recognition (ASR) system on the same speech data used to train the system. This requirement restricts the scope of these methods to corpora where both the acoustic material and the corresponding true transcripts are available. Typically, the text data available for language model (LM) training is an order of magnitude larger than manually transcribed speech. This paper provides a general framework to take advantage of this volume of textual data in the discriminative training of language models. We propose to generate probable N-best lists directly from the text material, which resemble the N-best lists produced by an ASR system by incorporating phonetic confusability estimated from the acoustic model of the ASR system. We present experiments with Japanese spontaneous lecture speech data, which demonstrate that discriminative LM training with the proposed framework is effective and provides modest gains in ASR accuracy.

[1]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[2]  Chin-Hui Lee,et al.  Discriminative training of language models for speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Shrikanth S. Narayanan,et al.  Average divergence distance as a statistical discrimination measure for hidden Markov models , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  François Yvon,et al.  Discriminative training of finite state decoding graphs , 2005, INTERSPEECH.

[5]  Stanley F. Chen,et al.  Performance Prediction for Exponential Language Models , 2009, NAACL.

[6]  Geoffrey Zweig,et al.  Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7]  Brian Kingsbury,et al.  Evaluation of Proposed Modifications to MPE for Large Scale Discriminative Training , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[8]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[9]  Sanjeev Khudanpur,et al.  Self-supervised discriminative training of statistical language models , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[10]  Vaibhava Goel,et al.  Optimizing speech recognition grammars using a measure of similarity between hidden Markov models , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Ben Sandbank Refining Generative Language Models using Discriminative Learning , 2008, EMNLP.

[12]  Keikichi Hirose,et al.  Integration of MLLR adaptation with pronunciation proficiency adaptation for non-native speech recognition , 2002, INTERSPEECH.

[13]  Shrikanth S. Narayanan,et al.  Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition , 2006, 2006 IEEE International Symposium on Information Theory.

[14]  F. Jelinek,et al.  Perplexity—a measure of the difficulty of speech recognition tasks , 1977 .

[15]  Jean-Luc Gauvain,et al.  Training Neural Network Language Models on Very Large Corpora , 2005, HLT.

[16]  Masafumi Nishimura,et al.  Acoustically discriminative training for language models , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  John R. Hershey,et al.  Word confusability - measuring hidden Markov model similarity , 2007, INTERSPEECH.

[18]  Brian Roark,et al.  Discriminative n-gram language modeling , 2007, Comput. Speech Lang..

[19]  John R. Hershey,et al.  Variational Kullback-Leibler divergence for Hidden Markov models , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[20]  Daisuke Okanohara Jun A Discriminative Language Model with Pseudo-Negative Samples , 2007 .

[21]  Peder A. Olsen,et al.  Theory and practice of acoustic confusability , 2002, Comput. Speech Lang..

[22]  Robert Miller,et al.  Just-in-time language modelling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[23]  Kiyohiro Shikano,et al.  Automatic n-gram language model creation from web resources , 2001, INTERSPEECH.

[24]  John R. Hershey,et al.  Variational Bhattacharyya divergence for hidden Markov models , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Mingjing Li,et al.  Discriminative training on language model , 2000, INTERSPEECH.

[26]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[27]  Brian Roark,et al.  Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm , 2004, ACL.

[28]  John R. Hershey,et al.  Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[29]  Stanley F. Chen,et al.  Shrinking Exponential Language Models , 2009, NAACL.

[30]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[31]  Berlin Chen,et al.  Minimum word error based discriminative training of language models , 2005, INTERSPEECH.

[32]  Jonathan G. Fiscus,et al.  Tools for the analysis of benchmark speech recognition tests , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[33]  Atsushi Nakamura,et al.  An approach to efficient generation of high-accuracy and compact error-corrective models for speech recognition , 2007, INTERSPEECH.

[34]  Jean-Luc Gauvain,et al.  Lightly supervised and unsupervised acoustic model training , 2002, Comput. Speech Lang..

[35]  Bhuvana Ramabhadran,et al.  Constrained discriminative training of N-gram language models , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[36]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.