A novel speech enhancement method by learnable sparse and low-rank decomposition and domain adaptation

The sparse and low-rank components of a corrupted signal are considered as speech and noise in time-frequency domain.A new alternating optimization algorithm is proposed for decomposition of the noisy data using sparse coding over the speech and noise dictionaries.The low sparse approximation error is obtained with adequate noise training frames achieved from a noise estimation algorithm.Domain adaptation method alleviates the mismatch between train and test circumstances.The main drawbacks seen in the earlier dictionary-learning-based speech enhancement methods can be solved. This paper proposes a novel speech enhancement algorithm based on a low-rank sparse decomposition model. The sparse and low-rank components of a corrupted signal are considered as speech and noise in time-frequency domain, respectively. We use a new alternating optimization algorithm for accurate decomposition of the noisy observed data using sparse coding over the speech and noise dictionaries.Adequate noise training frames with the same data size as speech data are provided by a noise estimation algorithm to learn the overcomplete noise dictionaries with low sparse approximation error. Since encountering non-stationary noises reduces the performance of speech enhancement methods, we take advantage of domain adaptation method which is a novel speech enhancement procedure to transform a learned noise dictionary to an adapted dictionary by data distribution captured in the enhancement process.Using this step, an adapted separation scheme based on the current situation of noisy space is carried out, and the main drawbacks seen in the earlier dictionary-learning-based speech enhancement methods are solved. The proposed approach results in a significant reduction of noise, especially for non-stationary noises, in comparison with the earlier methods in this context and the traditional procedures, based on different objective and subjective measures.

[1]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[2]  Mátyás A. Sustik,et al.  On the existence of equiangular tight frames , 2007 .

[3]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[4]  Yi Hu,et al.  Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. , 2009, The Journal of the Acoustical Society of America.

[5]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[6]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[7]  I. Jolliffe Principal Component Analysis , 2002 .

[8]  Jacob Benesty,et al.  Springer Handbook of Speech Processing and Communication , 2007 .

[9]  Victor Vianu,et al.  Invited articles section foreword , 2010, JACM.

[10]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[11]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[12]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[13]  Joachim M. Buhmann,et al.  Speech enhancement with sparse coding in learned dictionaries , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Philipos C. Loizou,et al.  A multi-band spectral subtraction method for enhancing speech corrupted by colored noise , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Mark D. Plumbley,et al.  Learning Incoherent Dictionaries for Sparse Approximation Using Iterative Projections and Rotations , 2013, IEEE Transactions on Signal Processing.

[16]  Joachim M. Buhmann,et al.  Speech Enhancement Using Generative Dictionary Learning , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[18]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[19]  Dacheng Tao,et al.  GoDec: Randomized Lowrank & Sparse Matrix Decomposition in Noisy Case , 2011, ICML.

[20]  P. Filzmoser,et al.  Algorithms for Projection-Pursuit Robust Principal Component Analysis , 2007 .

[21]  Yi Hu,et al.  A generalized subspace approach for enhancing speech corrupted by colored noise , 2003, IEEE Trans. Speech Audio Process..

[22]  Jun Du,et al.  An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.

[23]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[24]  Paris Smaragdis,et al.  Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Sanaz Seyedin,et al.  Modified coherence-based dictionary learning method for speech enhancement , 2015, IET Signal Process..

[26]  Xiongwei Zhang,et al.  Speech Denoising via Low‐Rank and Sparse Matrix Decomposition , 2014 .

[27]  Qi Zhu,et al.  A novel speech enhancement method based on constrained low-rank and sparse matrix decomposition , 2014, Speech Commun..

[28]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[29]  Daniel P. W. Ellis,et al.  Speech enhancement by sparse, low-rank, and dictionary spectrogram decomposition , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[30]  Yang Lu,et al.  A geometric approach to spectral subtraction , 2008, Speech Commun..

[31]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[32]  Bhiksha Raj,et al.  Speech denoising using nonnegative matrix factorization with priors , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[35]  Gang Chen,et al.  Dictionary transfer for image denoising via domain adaptation , 2012, 2012 19th IEEE International Conference on Image Processing.

[36]  Qin Zhang,et al.  Noise Reduction Based on Robust Principal Component Analysis , 2014 .

[37]  Arne Leijon,et al.  A new approach for speech enhancement based on a constrained Nonnegative Matrix Factorization , 2011, 2011 International Symposium on Intelligent Signal Processing and Communications Systems (ISPACS).

[38]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[39]  S. Mallat,et al.  Adaptive greedy approximations , 1997 .

[40]  Xiaoming Huo,et al.  Uncertainty principles and ideal atomic decomposition , 2001, IEEE Trans. Inf. Theory.

[41]  Yu Tsao,et al.  Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.

[42]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[43]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[44]  Mikkel N. Schmidt,et al.  Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.