Regularized NMF-based speech enhancement with spectral components modeled by gaussian mixtures

In this paper, we introduce a single channel speech enhancement algorithm based on regularized non-negative matrix factorization (NMF). In our proposed formulation, the log-likelihood function (LLF) of the magnitude spectral components, based on Gaussian mixture models (GMM) for both the speech and background noise signals, is included as a regularization term in the NMF cost function. By using this spectral type of regularization, we can incorporate the statistical properties of the signals during the estimation of both the basis and excitation martices in NMF model. Furthermore, borrowing from the expectation-maximization (EM) algorithm and to reduce the computational complexity of the NMF update, the LLF is replaced by its expected value. Experimental results of perceptual evaluation of speech quality (PESQ), source-to-distortion ratio (SDR) and source-to-interference ratio (SIR) show that the proposed speech enhancement algorithm provides better performance than the compared benchmark algorithms.

[1]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[2]  Yi Hu,et al.  A generalized subspace approach for enhancing speech corrupted by colored noise , 2003, IEEE Trans. Speech Audio Process..

[3]  Thippur V. Sreenivas,et al.  GMM based Bayesian approach to speech enhancement in signal / transform domain , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Paris Smaragdis,et al.  Convolutive Speech Bases and Their Application to Supervised Speech Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[6]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Paris Smaragdis,et al.  A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Arne Leijon,et al.  A new linear MMSE filter for single channel speech enhancement based on Nonnegative Matrix Factorization , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[9]  Bhiksha Raj,et al.  Speech denoising using nonnegative matrix factorization with priors , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[11]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[12]  Hakan Erdogan,et al.  Regularized nonnegative matrix factorization using Gaussian mixture priors for supervised single channel source separation , 2013, Comput. Speech Lang..

[13]  Jesper Jensen,et al.  Speech enhancement based on Rayleigh mixture modeling of speech spectral amplitude distributions , 2007, 2007 15th European Signal Processing Conference.

[14]  Nathalie Virag,et al.  Single channel speech enhancement based on masking properties of the human auditory system , 1999, IEEE Trans. Speech Audio Process..

[15]  Emmanuel Vincent,et al.  Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Eric Plourde,et al.  Auditory-Based Spectral Amplitude Estimators for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.