Source generator equalization and enhancement of spectral properties for robust speech recognition in noise and stress

Studies have shown that depending on speaker task and environmental conditions, recognizers are sensitive to noisy stressful environments. The focus of the study is to achieve robust recognition in diverse environmental conditions through the formulation of feature enhancement and stress equalization algorithms under the framework of source generator theory. The generator framework is considered as a means of modeling production variation under stressful speaking conditions. A multi-dimensional stress equalization procedure is formulated that produces recognition features less sensitive to varying factors caused by stress. A feature enhancement algorithm is employed based on iterative techniques previously derived for enhancement of speech in varying background noise environments. Combined stress equalization and feature enhancement reduces average word error rates across 10 noisy stressful conditions by -38.7% (e.g., noisy loud, angry, and Lombard effect stress conditions, etc.). The results suggest that the combination of a flexible source generator framework to address stressed speaking conditions, and a feature enhancement algorithm that adapts based on speech-specific constraints, can be effective in reducing the consequences of stress and noise for robust automatic recognition. >

[1]  John H. L. Hansen,et al.  Adaptive source generator compensation and enhancement for speech recognition in noisy stressful environments , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  John H. L. Hansen,et al.  Analysis and compensation of stressed and noisy speech with application to robust automatic recognition , 1988 .

[3]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[4]  Yeunung Chen,et al.  Cepstral domain talker stress compensation for robust speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[5]  B. J. Stanton,et al.  Robust recognition of loud and Lombard speech in the fighter cockpit environment , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[6]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[7]  John H. L. Hansen,et al.  Stress compensation and noise reduction algorithms for robust speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[8]  Benjamin Peter Milner,et al.  Speech recognition in adverse environments , 1994 .

[9]  John H. L. Hansen,et al.  Constrained iterative speech enhancement with application to speech recognition , 1991, IEEE Trans. Signal Process..

[10]  Brian Hanson,et al.  Robust speaker-independent word recognition using static, dynamic and acceleration features: experiments with Lombard and noisy speech , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[11]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[12]  John H. L. Hansen,et al.  Morphological constrained feature enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect , 1994, IEEE Trans. Speech Audio Process..

[13]  E. A. Martin,et al.  Multi-style training for robust isolated-word speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  John H. L. Hansen,et al.  Evaluation of acoustic correlates of speech under stress for robust speech recognition , 1989, Proceedings of the Fifteenth Annual Northeast Bioengineering Conference.

[15]  John H. L. Hansen,et al.  A source generator based production model for environmental robustness in speech recognition , 1994, ICSLP.

[16]  Yariv Ephraim,et al.  Statistical-model-based speech enhancement systems , 1992, Proc. IEEE.

[17]  J. Junqua,et al.  Acoustic and perceptual studies of Lombard speech: application to isolated-words automatic speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.