Unsupervised speech enhancement in low SNR environments via sparseness and temporal gradient regularization

Abstract A crucial stage in unsupervised speech enhancement algorithm is the estimation of noise related parameters which usually needs prior models for noise. However, estimation of such parameters is a challenging task at low signal-to-noise ratios or in nonstationary noisy environments. In this paper, without knowing the prior models, an unsupervised and iterative speech enhancement algorithm is proposed which assumes speech spectrogram and its temporal gradient as sparse components. The quasi-harmonic description of the speech signals justifies this assumption. The speech enhancement is performed by decomposing the spectrogram of noisy speech into sparse matrix, enforcing the sparsity and temporal gradient regularizations. The Kullback–Leibler divergence is incorporated to minimize the distance between the observation and reconstructed components with nonnegativity constraints. Alternating direction method of multipliers is used to optimize the algorithm. The proposed algorithm is different from many speech enhancement approaches as it reduces background noise in an uncomplicated manner without need of a noise estimation algorithm to find noise-only excerpt. In addition, the proposed algorithm obtains an improved performance in adverse environments without knowing the exact distribution of noise. The experimental results demonstrate that the proposed algorithm outperforms the competing algorithms in terms of the speech quality and intelligibility. Moreover, the composite objective measure reinforced better performance in terms of residual noise and speech distortion in strong noise.

[1]  Hanseok Ko,et al.  Single-channel speech enhancement method using reconstructive NMF with spectrotemporal speech presence probabilities , 2017 .

[2]  James M. Joyce Kullback-Leibler Divergence , 2011, International Encyclopedia of Statistical Science.

[3]  Nasir Saleem,et al.  Low rank sparse decomposition model based speech enhancement using gammatone filterbank and Kullback–Leibler divergence , 2018, International Journal of Speech Technology.

[4]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[5]  Jacob Benesty,et al.  A perspective on multichannel noise reduction in the time domain , 2013 .

[6]  DeLiang Wang,et al.  Speech perception of noise with binary gains. , 2008, The Journal of the Acoustical Society of America.

[7]  Mohamed Djendi,et al.  A new efficient two-channel backward algorithm for speech intelligibility enhancement: A subband approach , 2014 .

[8]  Meng Sun,et al.  Speech Enhancement Under Low SNR Conditions Via Noise Estimation Using Sparse and Low-Rank NMF with Kullback–Leibler Divergence , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[10]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[11]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[12]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[14]  DeLiang Wang,et al.  On the optimality of ideal binary time-frequency masks , 2009, Speech Commun..

[15]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[16]  Bart De Moor,et al.  The singular value decomposition and long and short spaces of noisy matrices , 1993, IEEE Trans. Signal Process..

[17]  Paris Smaragdis,et al.  Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  R. Qiu,et al.  Speech enhancement method based on low-rank approximation in a reproducing kernel Hilbert space , 2016 .

[19]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[20]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[21]  Hugo Van hamme,et al.  A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition , 2007, EURASIP J. Adv. Signal Process..

[22]  Nasir Saleem,et al.  Noise Reduction Based on Soft Masks by Incorporating SNR Uncertainty in Frequency Domain , 2018, Circuits Syst. Signal Process..

[23]  Björn W. Schuller,et al.  Real-Time Speech Separation by Semi-supervised Nonnegative Matrix Factorization , 2012, LVA/ICA.

[24]  Yi Hu,et al.  A generalized subspace approach for enhancing speech corrupted by colored noise , 2003, IEEE Trans. Speech Audio Process..

[25]  Ayman I. Madbouly,et al.  Assessment model of classroom acoustics criteria for enhancing speech intelligibility and learning quality , 2016 .

[26]  DeLiang Wang,et al.  On Training Targets for Supervised Speech Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[27]  Preeti Rao,et al.  Speech enhancement in nonstationary noise environments using noise properties , 2006, Speech Commun..

[28]  Qi Zhu,et al.  A novel speech enhancement method based on constrained low-rank and sparse matrix decomposition , 2014, Speech Commun..

[29]  Xiongwei Zhang,et al.  Speech Denoising via Low‐Rank and Sparse Matrix Decomposition , 2014 .

[30]  Søren Vang Andersen,et al.  Speech Enhancement with Natural Sounding Residual Noise Based on Connected Time-Frequency Speech Presence Regions , 2005, EURASIP J. Adv. Signal Process..

[31]  Mohamed Djendi,et al.  Acoustic noise reduction by new two-channel proportionate forward symmetric adaptive decorrelating algorithms in sparse systems , 2018, Applied Acoustics.

[32]  Javier Ramírez,et al.  Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[33]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Wai Lok Woo,et al.  Unsupervised Single-Channel Separation of Nonstationary Signals Using Gammatone Filterbank and Itakura–Saito Nonnegative Matrix Two-Dimensional Factorizations , 2013, IEEE Transactions on Circuits and Systems I: Regular Papers.

[35]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..