Basis compensation in non-negative matrix factorization model for speech enhancement

In this paper, we propose a basis compensation algorithm for non-negative matrix factorization (NMF) models as applied to supervised single-channel speech enhancement. In the proposed framework, we use extra free basis vectors for both the clean speech and noise during the enhancement stage in order to capture the features which are not included in the training data. Specifically, the free basis vectors of the clean speech are obtained by exploiting a priori knowledge based on a Gamma distribution. The free bases of the noise are estimated using a regularization approach, which enforces them to be orthogonal to the clean speech and noise basis vectors estimated during the training stage. Experimental results show that the proposed NMF algorithm with basis compensation provides better performance in speech enhancement than the benchmark algorithms.

[1]  Ali Taylan Cemgil,et al.  Mixtures of Gamma Priors for Non-negative Matrix Factorization Based Speech Separation , 2009, ICA.

[2]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Regularized NMF-based speech enhancement with spectral components modeled by gaussian mixtures , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[4]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[5]  Thomas F. Quatieri,et al.  Speech Enhancement Using Sparse Convolutive Non-negative Matrix Factorization with Basis Adaptation , 2012, INTERSPEECH.

[6]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[7]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[8]  Hakan Erdogan,et al.  Discriminative nonnegative dictionary learning using cross-coherence penalties for single channel source separation , 2013, INTERSPEECH.

[9]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[10]  Paris Smaragdis,et al.  Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Hakan Erdogan,et al.  Adaptation of Speaker-Specific Bases in Non-Negative Matrix Factorization for Single Channel Speech-Music Separation , 2011, INTERSPEECH.

[12]  Nam Soo Kim,et al.  NMF-Based Speech Enhancement Using Bases Update , 2015, IEEE Signal Processing Letters.

[13]  Benoît Champagne,et al.  Incorporating the human hearing properties in the signal subspace approach for speech enhancement , 2003, IEEE Trans. Speech Audio Process..

[14]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[15]  G. Buchsbaum,et al.  Color categories revealed by non-negative matrix factorization of Munsell color spectra , 2002, Vision Research.

[16]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[17]  Yunkeun Lee,et al.  Non-negative Matrix Factorization Based Noise Reduction for Noise Robust Automatic Speech Recognition , 2012, LVA/ICA.

[18]  Nathalie Virag,et al.  Single channel speech enhancement based on masking properties of the human auditory system , 1999, IEEE Trans. Speech Audio Process..

[19]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Paris Smaragdis,et al.  A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Simon J. Godsill,et al.  Bayesian extensions to non-negative matrix factorisation for audio signal modelling , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  K. Shikano,et al.  Music signal separation by orthogonality and maximum-distance constrained nonnegative matrix factorization with target signal information , 2011 .