Non-stationary noise estimation method based on bias-residual component decomposition for robust speech recognition

This paper addresses a noise suppression problem, namely the estimation of non-stationary noise sequences. In this problem, we assume that non-stationary noise can be decomposed into stationary and non-stationary components. These components are described respectively as the bias factor and the residual signal between the bias component and noise at each frame. This decomposition clarifies the role of each component, thus enabling us to apply a suitable parameter estimation technique to each component. In this paper, the bias component is estimated by the EM algorithm with the entire observed signal sequence. On the other hand, the residual component is sequentially estimated by multiplying the extended Kalman filter with the EM algorithm. In the evaluation results, we confirmed that the proposed method improved speech recognition accuracy compared with the noise estimation methods without component decomposition.

[1]  Olivier Siohan,et al.  Sequential estimation with optimal forgetting for robust speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.

[2]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[3]  Atsushi Nakamura,et al.  Efficient WFST-Based One-Pass Decoding With On-The-Fly Hypothesis Rescoring in Extremely Large Vocabulary Continuous Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[5]  Geoffrey E. Hinton,et al.  Parameter estimation for linear dynamical systems , 1996 .

[6]  S. Nakamura,et al.  Particle filtering and Polyak averaging-based non-stationary noise tracking for ASR in noise , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[7]  Richard M. Stern,et al.  A vector Taylor series approach for environment-independent speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  Richard M. Stern,et al.  On tracking noise with linear dynamical system models , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Alex Acero,et al.  Noise robust speech recognition with a switching linear dynamic model , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Reinhold Häb-Umbach,et al.  Modeling the dynamics of speech and noise for speech feature enhancement in ASR , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Antonio M. Peinado,et al.  Model-based compensation of the additive noise for continuous speech recognition. experiments using the Aurora II database and tasks , 2001, INTERSPEECH.

[12]  Masakiyo Fujimoto,et al.  A study of mutual front-end processing method based on statistical model for noise robust speech recognition , 2009, INTERSPEECH.