Cepstral domain segmental feature vector normalization for noise robust speech recognition

Abstract To date, speech recognition systems have been applied in real world applications in which they must be able to provide a satisfactory recognition performance under various noise conditions. However, a mismatch between the training and testing conditions often causes a drastic decrease in the performance of the systems. In this paper, we propose a segmental feature vector normalization technique which makes an automatic speech recognition system more robust to environmental changes by normalizing the output of the signal-processing front-end to have similar segmental parameter statistics in all noise conditions. The viability of the suggested technique was verified in various experiments using different background noises and microphones. In an isolated word recognition task, the proposed normalization technique reduced the error rates by over 70% in noisy conditions with respect to the baseline tests, and in a microphone mismatch case, over 75% error rate reduction was achieved. In a multi-environment speaker-independent connected digit recognition task, the proposed method reduced the error rates by over 16%.

[1]  Hervé Bourlard,et al.  Optimizing recognition and rejection performance in wordspotting systems , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Anthony J. Robinson,et al.  Real-time recognition of broadcast radio speech , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3]  Mark J. F. Gales,et al.  Cepstral parameter compensation for HMM recognition in noise , 1993, Speech Commun..

[4]  I. Boyd,et al.  The voice activity detector for the Pan-European digital cellular mobile telephone service , 1988, International Conference on Acoustics, Speech, and Signal Processing,.

[5]  Petri Haavisto,et al.  An improved noise compensation algorithm for speech recognition in noise , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6]  Kari Laurila,et al.  Noise robust speech recognition with state duration constraints , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  T. Claes,et al.  SNR-normalisation for robust speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  Petri Haavisto,et al.  Dynamic parameter compensation for speech recognition in noise , 1995, EUROSPEECH.

[9]  Hynek Hermansky,et al.  Multi-band and adaptation approaches to robust speech recognition , 1997, EUROSPEECH.

[10]  John S. D. Mason,et al.  On the limitations of cepstral features in noise , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Kuldip K. Paliwal A study of LSF representation for speaker-dependent and speaker-independent HMM-based speech recognition systems , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[12]  Jérôme Boudy,et al.  Experiments with a nonlinear spectral subtractor (NSS), Hidden Markov models and the projection, for robust speech recognition in cars , 1991, Speech Commun..

[13]  Aaron E. Rosenberg,et al.  Cepstral channel normalization techniques for HMM-based speaker verification , 1994, ICSLP.

[14]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.