Robust Estimation of Non-Stationary Noise Power Spectrum for Speech Enhancement

We propose a novel method for noise power spectrum estimation in speech enhancement. This method called extended-DATE (E-DATE) extends the d-dimensional amplitude trimmed estimator (DATE), originally introduced for additive white gaussian noise power spectrum estimation in “Robust estimation of noise standard deviation in presence of signals with unknown distributions and occurrences” (D. Pastor and F. Socheleau, IEEE Trans. Signal Processing, vol. 60, no. 4, pp. 1545-1555, Apr. 2012) to the more challenging scenario of non-stationary noise. The key idea is that, in each frequency bin and within a sufficiently short time period, the noise instantaneous power spectrum can be considered as approximately constant and estimated as the variance of a complex gaussian noise process possibly observed in the presence of the signal of interest. The proposed method relies on the fact that the Short-Time Fourier Transform (STFT) of noisy speech signals is sparse in the sense that transformed speech signals can be represented by a relatively small number of coefficients with large amplitudes in the time-frequency domain. The E-DATE estimator is robust in that it does not require prior information about the signal probability distribution except for the weak-sparseness property. In comparison to other state-of-the-art methods, the E-DATE is found to require the smallest number of parameters (only two). The performance of the proposed estimator has been evaluated in combination with noise reduction and compared to alternative methods. This evaluation involves objective as well as pseudo-subjective criteria.

[1]  Abdeldjalil Aïssa-El-Bey,et al.  Robust Statistics Based Noise Variance Estimation: Application to Wideband Interception of Noncooperative Communications , 2011, IEEE Transactions on Aerospace and Electronic Systems.

[2]  S. Berman Sojourns and Extremes of Stochastic Processes , 1992 .

[3]  Jesper Jensen,et al.  Noise Tracking Using DFT Domain Subspace Decompositions , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[6]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[7]  R. A. Silverman,et al.  Special functions and their applications , 1966 .

[8]  R. Serfling Approximation Theorems of Mathematical Statistics , 1980 .

[9]  Hans-Günter Hirsch,et al.  Noise estimation techniques for robust speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[10]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[11]  Abdeldjalil Aïssa-El-Bey,et al.  Robust underdetermined blind audio source separation of sparse signals in the time-frequency domain , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[13]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[14]  Laurie Davies,et al.  The identification of multiple outliers , 1993 .

[15]  Rongshan Yu A low-complexity noise estimation algorithm based on smoothing of noise power estimation and estimation bias correction , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Rainer Martin,et al.  Spectral Subtraction Based on Minimum Statistics , 2001 .

[17]  Abdourrahmane M. Atto,et al.  Detection threshold for non-parametric estimation , 2008, Signal Image Video Process..

[18]  Tomohiro Nakatani,et al.  Noise Power Spectral Density Tracking: A Maximum Likelihood Perspective , 2012, IEEE Signal Processing Letters.

[19]  Abdourrahmane M. Atto,et al.  Wavelet Shrinkage: From Sparsity and Robust Testing to Smooth Adaptation , 2010 .

[20]  I. Cohen,et al.  Noise estimation by minima controlled recursive averaging for robust speech enhancement , 2002, IEEE Signal Processing Letters.

[21]  Beena Ahmed,et al.  A voice activity detector using the chi-square test , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Yi Hu,et al.  Evaluation of objective measures for speech enhancement , 2006, INTERSPEECH.

[23]  Dominique Pastor,et al.  A theoretical result for processing signals that have unknown distributions and priors in white Gaussian noise , 2008, Comput. Stat. Data Anal..

[24]  S. Mallat A wavelet tour of signal processing , 1998 .

[25]  Abdeldjalil Aïssa-El-Bey,et al.  Contribution of statistical tests to sparseness-based blind source separation , 2012, EURASIP J. Adv. Signal Process..

[26]  Alexander Fischer,et al.  Quantile based noise estimation for spectral subtraction and Wiener filtering , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[27]  Dominique Pastor,et al.  Robust Estimation of Noise Standard Deviation in Presence of Signals With Unknown Distributions and Occurrences , 2012, IEEE Transactions on Signal Processing.

[28]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[29]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..