Suppression of additive noise using a power spectral density MMSE estimator

In this letter, we propose a novel speech enhancement approach, called power spectral density minimum mean-square error (PSD-MMSE) estimation-based speech enhancement, which is implemented in the power spectral domain where stationary stochastic noise can be modeled as the exponential distribution. Speech magnitude-squared spectra are modeled as the mixed exponential distribution. And an MMSE estimator is constructed based on the parametric distributions. Besides, a fast algorithm is presented to implement the approach in real time. Experimental results of Itakura-Saito distortion measures show that the proposed approach is superior to alternative speech enhancement algorithms.

[1]  Yariv Ephraim,et al.  Statistical-model-based speech enhancement systems , 1992, Proc. IEEE.

[2]  Yariv Ephraim,et al.  A Bayesian estimation approach for speech enhancement using hidden Markov models , 1992, IEEE Trans. Signal Process..

[3]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[4]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[5]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[6]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[7]  Richard V. Cox,et al.  A modular approach to speech enhancement with an application to speech coding , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8]  Rainer Martin,et al.  MMSE estimation of magnitude-squared DFT coefficients with superGaussian priors , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  Yunxin Zhao,et al.  Frequency-domain maximum likelihood estimation for automatic speech recognition in additive and convolutive noises , 2000, IEEE Trans. Speech Audio Process..