Adding noise to improve noise robustness in speech recognition

In this work we explore a technique for increasing recognition accuracy on speech affected by corrupting noise of an undetermined nature, by the addition of a known and well-behaved noise (masking noise). The same type of noise used for masking is added to the training data, thus reducing the gap between training and test conditions, independent of the type of corrupting noise, or whether it is stationary or not. While still in an early development stage, the new approach shows consistent improvements in accuracy and robustness for a variety of conditions, where no use is made of a-priori knowledge of the corrupting noise. The approach is shown to be of particular interest to the case of cross-talk corrupting noise, a complicated situation in speech recognition for which the relative gain with the proposed approach is over 24%.

[1]  Juan Arturo Nolazco-Flores,et al.  Continuous speech recognition in noise using spectral subtraction and HMM adaptation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[3]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[4]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[5]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[6]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[7]  Hervé Bourlard,et al.  Continuous speech recognition , 1995, IEEE Signal Process. Mag..

[8]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[9]  D. Van Compernolle Increased noise immunity in large vocabulary speech recognition with the aid of spectral subtraction , 1987, ICASSP.

[10]  Li Deng,et al.  Uncertainty decoding with SPLICE for noise robust speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Li Deng,et al.  Evaluation of the SPLICE algorithm on the Aurora2 database , 2001, INTERSPEECH.

[12]  T. Claes,et al.  SNR-normalisation for robust speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.