A new approach for speech enhancement based on a constrained Nonnegative Matrix Factorization

In this paper, a new approach is presented for single-channel speech enhancement which is based on Nonnegative Matrix Factorization (NMF). The proposed scheme combines the noise Power Spectral Density (PSD) estimation based on a constrained NMF and Wiener filtering to enhance the noisy speech. The imposed constraint is motivated by the time correlation of the underlying observations and enforces the NMF to give smoother estimates of the nonnegative factors. Compared to the standard NMF approach and Wiener filtering based on a recently developed noise PSD estimator, Source to Distortion Ratio (SDR) is improved for the evaluated noise types for different input SNRs.

[1]  Yariv Ephraim,et al.  Recent Advancements in Speech Enhancement , 2004 .

[2]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Rainer Martin,et al.  An evaluation of noise power spectral density estimation algorithms in adverse acoustic environments , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Andrzej Cichocki,et al.  New Algorithms for Non-Negative Matrix Factorization in Applications to Blind Source Separation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  Bhiksha Raj,et al.  Speech denoising using nonnegative matrix factorization with priors , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[8]  Peter Vary,et al.  Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model , 2005, EURASIP J. Adv. Signal Process..

[9]  Jesper Jensen,et al.  MMSE based noise PSD tracking with low complexity , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Bhiksha Raj,et al.  Regularized non-negative matrix factorization with temporal dependencies for speech denoising , 2008, INTERSPEECH.

[11]  Ali Taylan Cemgil,et al.  Nonnegative matrix factorizations as probabilistic inference in composite models , 2009, 2009 17th European Signal Processing Conference.

[12]  Paris Smaragdis,et al.  A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Bhiksha Raj,et al.  Non-negative Hidden Markov Modeling of Audio with Application to Source Separation , 2010, LVA/ICA.

[15]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.