A frame level boosting training scheme for acoustic modeling

Conventional Boosting algorithms for acoustic modeling have two notable weaknesses. (1) The objective function aims to minimize utterance error rate, though the goal for most speech recognition systems is to reduce word error rate. (2) During Boosting training, an utterance is treated as a unit for resampling and each frame within the same utterance is assigned equal weight. Intuitively, the frames associated with a misclassified word should be given more emphasis than others. We propose a frame level Boosting training scheme that addresses these shortcomings and allows each frame to have a different weight. We describe a technique and provide experimental results for this approach.

[1]  Rong Zhang,et al.  Comparative study of boosting and non-boosting training for constructing ensembles of acoustic models , 2003, INTERSPEECH.

[2]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[3]  Carsten Meyer Utterance-level boosting of HMM speech recognizers , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Bhiksha Raj,et al.  A boosting approach for confidence scoring , 2001, INTERSPEECH.

[5]  Holger Schwenk,et al.  Using boosting to improve a hybrid HMM/neural network speech recognizer , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[6]  Richard Vandervoort Cox,et al.  Enhancing speech intelligibility using variable-rate time-scale modification , 2006 .

[7]  Say Wei Foo,et al.  Speaker recognition using adaptively boosted decision tree classifier , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Richard M. Stern,et al.  Duration normalization and hypothesis combination for improved spontaneous speech recognition , 2003, INTERSPEECH.

[9]  Alexander I. Rudnicky,et al.  Creating natural dialogs in the carnegie mellon communicator system , 1999, EUROSPEECH.