Nonnegative matrix factorization with disjointness constraints for single channel speech separation

This paper addresses the problem of single channel speech separation using nonnegative matrix factorization (NMF) technique. In general, the standard NMF algorithm by itself does not guarantee statistical relationship between the matrices it computes. This leads to poor separation performance. To solve this problem, we propose to enforce disjointness constraint on the standard NMF algorithm in the separation process. The multiplicative update rules of the proposed algorithm are also derived in this paper. The performance of the proposed method is compared with standard NMF algorithm, which is based on the same linear model. The experimental results show that the proposed method achieves a better separation quality than the standard NMF.

[1]  K. Wilson,et al.  SPECTROGRAM DIMENSIONALITY REDUCTION WITH INDEPENDENCE CONSTRAINTS , 2010 .

[2]  Deniz Erdogmus,et al.  Underdetermined blind source separation in a time-varying environment , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Mikkel N. Schmidt,et al.  Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.

[4]  Rémi Gribonval,et al.  Audio source separation with a single sensor , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  K. J. Ray Liu,et al.  Multiplicative update rules for nonnegative matrix factorization with co-occurrence constraints , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[7]  Scott T. Rickard,et al.  Sparse sources are separated sources , 2006, 2006 14th European Signal Processing Conference.

[8]  Sam T. Roweis,et al.  One Microphone Source Separation , 2000, NIPS.

[9]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Terrence J. Sejnowski,et al.  Blind source separation of more sources than mixtures using overcomplete representations , 1999, IEEE Signal Processing Letters.

[11]  Franz Pernkopf,et al.  Source–Filter-Based Single-Channel Speech Separation Using Pitch Information , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Te-Won Lee,et al.  A Maximum Likelihood Approach to Single-channel Source Separation , 2003, J. Mach. Learn. Res..

[14]  Barak A. Pearlmutter,et al.  Sparse Representations for the Cocktail Party Problem , 2006, The Journal of Neuroscience.

[15]  Bhiksha Raj,et al.  Probabilistic Factorization of Non-negative Data with Entropic Co-occurrence Constraints , 2009, ICA.

[16]  Sam T. Roweis,et al.  Factorial models and refiltering for speech separation and denoising , 2003, INTERSPEECH.

[17]  Jon Barker,et al.  An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[18]  DeLiang Wang,et al.  A computational auditory scene analysis system for speech segregation and robust speech recognition , 2010, Comput. Speech Lang..

[19]  Richard M. Dansereau,et al.  A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation , 2006, EURASIP J. Audio Speech Music. Process..