Cepstral gain normalization for noise robust speech recognition

The paper describes a robust speech recognition technique which normalizes cepstral gains in order to remove effects of additive noise. We assume that the effects can be expressed by an approximate model which consists of gain and DC components in log-spectrum. Accordingly, we propose cepstral gain normalization (CGN) which normalizes the gains by means of calculating maximum and minimum values of cepstral coefficients in speech frames. The proposed method can extract noise robust features without a priori knowledge and environmental adaptation because it is applied to both training and testing data. We have evaluated recognition performance under noisy environments using the Noisex-92 database and a 100 Japanese city names task. The CGN provides improvements of recognition accuracy at various SNRs compared with combinations of conventional methods.

[1]  Richard M. Stern,et al.  A vector Taylor series approach for environment-independent speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[2]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[3]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[4]  Patrice Alexandre,et al.  Root cepstral analysis: A unified view. Application to speech processing in car noise environments , 1993, Speech Commun..

[5]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[6]  Misha Pavel,et al.  Intelligibility of speech with filtered time trajectories of spectral envelopes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Richard M. Stern,et al.  Environmental robustness in automatic speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[8]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[9]  VargaAndrew,et al.  Assessment for automatic speech recognition II , 1993 .

[10]  Juan Arturo Nolazco-Flores,et al.  Continuous speech recognition in noise using spectral subtraction and HMM adaptation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Hynek Hermansky,et al.  Should recognizers have ears? , 1998, Speech Commun..