On a generalization of margin-based discriminative training to robust speech recognition

Recently, there have been intensive studies of margin-based learning for automatic speech recognition (ASR). It is our believe that by securing a margin from the decision boundaries to the training samples, a correct decision can still be made if the mismatches between testing and training samples are well within the tolerance region specified by the margin. This nice property should be effective for robust ASR, where the testing condition is different from those in training. In this paper, we report on experiment results with soft margin estimation (SME) on the Aurora2 task and show that SME is very effective under clean training with more than 50% relative word error reductions in the clean, 20db, and 15db testing conditions, and still gives a slight improvement over conventional multi-condition training approaches. This demonstrates that the margin in SME can equip recognizers with a nice generalization property under adverse conditions.

[1]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[2]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[3]  Qiang Huo,et al.  An Environment-Compensated Minimum Classification Error Training Approach Based on Stochastic Vector Mapping , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  S. Haykin,et al.  Pattern Recognition Using a Family of Design Algorithms Based upon the Generalized Probabilistic Descent Method , 2001 .

[5]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[6]  Chin-Hui Lee,et al.  A maximum-likelihood approach to stochastic matching for robust speech recognition , 1996, IEEE Trans. Speech Audio Process..

[7]  Jinyu Li,et al.  Approximate Test Risk Bound Minimization Through Soft Margin Estimation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Lawrence K. Saul,et al.  Large Margin Hidden Markov Models for Automatic Speech Recognition , 2006, NIPS.

[9]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[10]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[11]  Li Deng,et al.  Large-vocabulary speech recognition under adverse acoustic environments , 2000, INTERSPEECH.

[12]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[13]  Yifan Gong,et al.  Speech recognition in noisy environments: A survey , 1995, Speech Commun..

[14]  Yves Normandin,et al.  Hidden Markov models, maximum mutual information estimation, and the speech recognition problem , 1992 .

[15]  Hui Jiang,et al.  Large margin hidden Markov models for speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.