Estimators of the Magnitude-Squared Spectrum and Methods for Incorporating SNR Uncertainty

Statistical estimators of the magnitude-squared spectrum are derived based on the assumption that the magnitude-squared spectrum of the noisy speech signal can be computed as the sum of the (clean) signal and noise magnitude-squared spectra. Maximum a posterior (MAP) and minimum mean square error (MMSE) estimators are derived based on a Gaussian statistical model. The gain function of the MAP estimator was found to be identical to the gain function used in the ideal binary mask (IdBM) that is widely used in computational auditory scene analysis (CASA). As such, it was binary and assumed the value of 1 if the local signal-to-noise ratio (SNR) exceeded 0 dB, and assumed the value of 0 otherwise. By modeling the local instantaneous SNR as an F-distributed random variable, soft masking methods were derived incorporating SNR uncertainty. The soft masking method, in particular, which weighted the noisy magnitude-squared spectrum by the a priori probability that the local SNR exceeds 0 dB was shown to be identical to the Wiener gain function. Results indicated that the proposed estimators yielded significantly better speech quality than the conventional minimum mean square error spectral power estimators, in terms of yielding lower residual noise and lower speech distortion.

[1]  I. Miller Probability, Random Variables, and Stochastic Processes , 1966 .

[2]  Eric John Diethorn Subband noise reduction methods for speech enhancement , 2000 .

[3]  T. Lotter NOISE REDUCTION BY MAXIMUM A POSTERIORI SPECTRAL AMPLITUDE ESTIMATION WITH SUPERGAUSSIAN SPEECH MODELING , 2003 .

[4]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[5]  I. Cohen Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator , 2002, IEEE Signal Processing Letters.

[6]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[7]  I. Cohen,et al.  Noise estimation by minima controlled recursive averaging for robust speech enhancement , 2002, IEEE Signal Processing Letters.

[8]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[9]  P. Loizou,et al.  Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. , 2008, The Journal of the Acoustical Society of America.

[10]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006, IEEE Trans. Neural Networks.

[11]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[12]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[13]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Peter Vary,et al.  Noise reduction by joint maximum a posteriori spectral amplitude and phase estimation with super-Gaussian speech modelling , 2004, 2004 12th European Signal Processing Conference.

[15]  Susanto Rahardja,et al.  /spl beta/-order MMSE spectral amplitude estimation for speech enhancement , 2005, IEEE Transactions on Speech and Audio Processing.

[16]  DeLiang Wang,et al.  On the optimality of ideal binary time-frequency masks , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Simon J. Godsill,et al.  Efficient Alternatives to the Ephraim and Malah Suppression Rule for Audio Signal Enhancement , 2003, EURASIP J. Adv. Signal Process..

[18]  Jesper Jensen,et al.  A data-driven approach to optimizing spectral speech enhancement methods for various error criteria , 2007, Speech Commun..

[19]  S. Godsill,et al.  Simple alternatives to the Ephraim and Malah suppression rule for speech enhancement , 2001, Proceedings of the 11th IEEE Signal Processing Workshop on Statistical Signal Processing (Cat. No.01TH8563).

[20]  S. Mallat A wavelet tour of signal processing , 1998 .

[21]  Philipos C. Loizou,et al.  Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Christof Faller,et al.  Suppressing acoustic echo in a spectral envelope space , 2005, IEEE Transactions on Speech and Audio Processing.

[23]  Richard V. Cox,et al.  A modular approach to speech enhancement with an application to speech coding , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[24]  Guo-Hong Ding,et al.  Suppression of additive noise using a power spectral density MMSE estimator , 2004, IEEE Signal Processing Letters.

[25]  Stéphane Mallat,et al.  Audio Denoising by Time-Frequency Block Thresholding , 2008, IEEE Transactions on Signal Processing.

[26]  Yang Lu,et al.  A geometric approach to spectral subtraction , 2008, Speech Commun..

[27]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[28]  Israel Cohen,et al.  Relaxed statistical model for speech enhancement and a priori SNR estimation , 2005, IEEE Transactions on Speech and Audio Processing.

[29]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[30]  Yi Hu,et al.  Subjective comparison and evaluation of speech enhancement algorithms , 2007, Speech Commun..

[31]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[32]  Joseph Sylvester Chang,et al.  A parametric formulation of the generalized spectral subtraction method , 1998, IEEE Trans. Speech Audio Process..

[33]  DeLiang Wang,et al.  On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.

[34]  DeLiang Wang,et al.  Binary and ratio time-frequency masks for robust speech recognition , 2006, Speech Commun..

[35]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[36]  DeLiang Wang,et al.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.

[37]  Maarten Jansen,et al.  Noise Reduction by Wavelet Thresholding , 2001 .

[38]  Yang Lu,et al.  An algorithm that improves speech intelligibility in noise for normal-hearing listeners. , 2009, The Journal of the Acoustical Society of America.

[39]  Richard Heusdens,et al.  A STUDY OF THE DISTRIBUTION OF TIME-DOMAIN SPEECH SAMPLES AND DISCRETE FOURIER COEFFICIENTS , 2005 .

[40]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[41]  Peter Vary,et al.  Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model , 2005, EURASIP J. Adv. Signal Process..

[42]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[43]  George S. Moschytz,et al.  Noise reduction by noise-adaptive spectral magnitude expansion , 1994 .

[44]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[45]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..