Noise Reduction Based on Soft Masks by Incorporating SNR Uncertainty in Frequency Domain

The binary mask approach has been studied recently to reduce the background noise and improve the speech intelligibility and quality in the noisy surroundings. This mask is usually applied at the time–frequency illustration of a noisy speech and discards portions of a speech below a signal-to-noise-ratio (SNR) threshold, whereas allowing others to pass over intact. The threshold, however, is normally very low, and considerable residual noise would exist. Moreover, the precise estimate of local instantaneous SNR in practical applications is a difficult task. By modeling the local instantaneous SNR as Fisher–Snedecor distributed random variable, the soft masks for noise reduction are derived by incorporating SNR uncertainty in the frequency domain. Instead of finding a different method to estimate the local instantaneous SNR, the probability of local instantaneous SNR is computed higher than the threshold. The results indicated that soft masks yielded significantly better speech quality in terms of speech distortion and residual noise.

[1]  Michael J. Denham,et al.  A Model of Auditory Streaming , 1995, NIPS.

[2]  John G. Beerends,et al.  A Perceptual Audio Quality Measure Based on a Psychoacoustic Sound Representation , 1992 .

[3]  Lauren Calandruccio,et al.  Determination of the Potential Benefit of Time-Frequency Gain Manipulation , 2006, Ear and hearing.

[4]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[6]  DeLiang Wang,et al.  A computational auditory scene analysis system for robust speech recognition , 2006, INTERSPEECH.

[7]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[8]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[9]  Eric John Diethorn Subband noise reduction methods for speech enhancement , 2000 .

[10]  Leonardo Zao,et al.  Speech Enhancement with EMD and Hurst-Based Mode Selection , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Wai Lok Woo,et al.  Adaptive Sparsity Non-Negative Matrix Factorization for Single-Channel Source Separation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[12]  P. Loizou,et al.  Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. , 2008, The Journal of the Acoustical Society of America.

[13]  DeLiang Wang,et al.  On the optimality of ideal binary time-frequency masks , 2009, Speech Commun..

[14]  John J. Soraghan,et al.  EMD-Based Filtering (EMDF) of Low-Frequency Noise for Speech Enhancement , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[16]  DeLiang Wang,et al.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.

[17]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[18]  Guo-Hong Ding,et al.  Suppression of additive noise using a power spectral density MMSE estimator , 2004, IEEE Signal Processing Letters.

[19]  Nasir Saleem,et al.  Single channel noise reduction system in low SNR , 2017, Int. J. Speech Technol..

[20]  Pierre Divenyi Speech Separation by Humans and Machines , 2004 .

[21]  George S. Moschytz,et al.  Noise reduction by noise-adaptive spectral magnitude expansion , 1994 .

[22]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[23]  Yi Hu,et al.  Subjective comparison and evaluation of speech enhancement algorithms , 2007, Speech Commun..

[24]  Bernard Widrow,et al.  Exploiting the harmonic structure for speech enhancement , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Yi Hu,et al.  Evaluation of objective measures for speech enhancement , 2006, INTERSPEECH.

[26]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[27]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[28]  Rosângela Coelho,et al.  Speech Enhancement with Nonstationary Acoustic Noise Detection in Time Domain , 2016, IEEE Signal Processing Letters.

[29]  Adnan Khan,et al.  Ideal binary masking for reducing convolutive noise , 2015, Int. J. Speech Technol..

[30]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[31]  Hamid Reza Abutalebi,et al.  Generalization of Maximum A Posteriori Amplitude Estimator Under Speech Presence Uncertainty for Speech Enhancement , 2014, Circuits Syst. Signal Process..

[32]  Martin Cooke,et al.  A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.

[33]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[34]  Aïcha Bouzid,et al.  Sparse Representations for Single Channel Speech Enhancement Based on Voiced/Unvoiced Classification , 2017, Circuits Syst. Signal Process..

[35]  Christof Faller,et al.  Suppressing acoustic echo in a spectral envelope space , 2005, IEEE Transactions on Speech and Audio Processing.

[36]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[37]  Wai Lok Woo,et al.  Unsupervised Single-Channel Separation of Nonstationary Signals Using Gammatone Filterbank and Itakura–Saito Nonnegative Matrix Two-Dimensional Factorizations , 2013, IEEE Transactions on Circuits and Systems I: Regular Papers.

[38]  DeLiang Wang,et al.  Pitch-based monaural segregation of reverberant speech. , 2006, The Journal of the Acoustical Society of America.

[39]  DeLiang Wang Primitive Auditory Segregation Based on Oscillatory Correlation , 1996 .

[40]  Joseph Sylvester Chang,et al.  A parametric formulation of the generalized spectral subtraction method , 1998, IEEE Trans. Speech Audio Process..

[41]  I. Cohen,et al.  Noise estimation by minima controlled recursive averaging for robust speech enhancement , 2002, IEEE Signal Processing Letters.

[42]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[43]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[44]  Phil D. Green,et al.  Handling missing data in speech recognition , 1994, ICSLP.

[45]  Guy J. Brown,et al.  Speech segregation based on sound localization , 2003 .