Utilizing spectro-temporal correlations for an improved speech presence probability based noise power estimation

For the enhancement of speech degraded by noise, accurate estimation of the noise power spectral density (PSD) is indispensable, especially if only a single microphone signal is available. Fast and accurate tracking of the noise PSD is particularly challenging in highly non-stationary noise types, since the distinction between speech and noise components becomes more difficult. Short-time discrete Fourier transform (STFT) based noise PSD estimation algorithms which employ estimates of the speech presence probability (SPP) with fixed priors have been shown to yield good tracking performance even in adverse noise conditions. In this paper, we compare two methods to incorporate spectro-temporal correlations to improve the tracking performance. The first method smoothes the noisy observation over time and frequency before computing the SPP, while the second is based on a Hidden Markov Model (HMM) of the speech presence and absence states. We show that the proposed modifications lead to improved noise PSD estimators which are less sensitive to spectral outliers of the noise and track changes in the noise PSD more quickly than the reference method. Further, when employed in a common speech enhancement setup, the proposed estimators achieve an increased noise reduction while keeping speech distortions at a comparable level.

[1]  Timo Gerkmann,et al.  Speech presence probability estimation based on temporal cepstrum smoothing , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Rainer Martin,et al.  Improved A Posteriori Speech Presence Probability Estimation Based on a Likelihood Ratio With Fixed Priors , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[5]  Reinhold Häb-Umbach,et al.  Using the turbo principle for exploiting temporal and spectral correlations in speech presence probability estimation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[7]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[8]  Peter Vary,et al.  Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model , 2005, EURASIP J. Adv. Signal Process..

[9]  Jesper Jensen,et al.  MMSE based noise PSD tracking with low complexity , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Reinhold Häb-Umbach,et al.  Exploiting Temporal Correlations in Joint Multichannel Speech Separation and Noise Suppression Using Hidden Markov Models , 2012, IWAENC.

[11]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..