Speech enhancement based on nonnegative matrix factorization with mixed group sparsity constraint

This paper addresses a challenging single-channel speech enhancement problem in real-world environment where speech signal is corrupted by high level background noise. While most state-of-the-art algorithms tries to estimate noise spectral power and filter it from the observed one to obtain enhanced speech, the paper discloses another approach inspired from audio source separation technique. In the considered method, generic spectral characteristics of speech and noise are first learned from various training signals by non-negative matrix factorization (NMF). They are then used to guide the similar factorization of the observed power spectrogram into speech part and noise part. Additionally, we propose to combine two existing group sparsity-inducing penalties in the optimization process and adapt the corresponding algorithm for parameter estimation based on multiplicative update (MU) rule. Experiment results over different settings confirm the effectiveness of the proposed approach.

[1]  Francis Bach,et al.  Itakura-Saito nonnegative matrix factorization with group sparsity , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Rémi Gribonval,et al.  Spatial location priors for Gaussian model based reverberant audio source separation , 2013, EURASIP J. Adv. Signal Process..

[3]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[4]  Te-Won Lee,et al.  Blind Speech Separation , 2007, Blind Speech Separation.

[5]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Dalia El Badawy,et al.  On-the-fly audio source separation , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[7]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[8]  Philipos C. Loizou,et al.  Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Fabian J. Theis,et al.  The signal separation evaluation campaign (2007-2010): Achievements and remaining challenges , 2012, Signal Process..

[11]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[12]  Jacob Benesty,et al.  Speech Enhancement , 2010 .

[13]  Gautham J. Mysore,et al.  Universal speech models for speaker independent single channel source separation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Dalia El Badawy,et al.  Relative group sparsity for non-negative matrix factorization with application to on-the-fly audio source separation , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[16]  I. Cohen Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator , 2002, IEEE Signal Processing Letters.

[17]  Bhiksha Raj,et al.  Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.

[18]  Ning Ma,et al.  The PASCAL CHiME speech separation and recognition challenge , 2013, Comput. Speech Lang..

[19]  Rahim Saeidi,et al.  Group Sparsity for Speaker Identity Discrimination in Factorisation-based Speech Recognition , 2012, INTERSPEECH.