Regularized nonnegative matrix factorization using Gaussian mixture priors for supervised single channel source separation

We introduce a new regularized nonnegative matrix factorization (NMF) method for supervised single-channel source separation (SCSS). We propose a new multi-objective cost function which includes the conventional divergence term for the NMF together with a prior likelihood term. The first term measures the divergence between the observed data and the multiplication of basis and gains matrices. The novel second term encourages the log-normalized gain vectors of the NMF solution to increase their likelihood under a prior Gaussian mixture model (GMM) which is used to encourage the gains to follow certain patterns. In this model, the parameters to be estimated are the basis vectors, the gain vectors and the parameters of the GMM prior. We introduce two different ways to train the model parameters, sequential training and joint training. In sequential training, after finding the basis and gains matrices, the gains matrix is then used to train the prior GMM in a separate step. In joint training, within each NMF iteration the basis matrix, the gains matrix and the prior GMM parameters are updated jointly using the proposed regularized NMF. The normalization of the gains makes the prior models energy independent, which is an advantage as compared to earlier proposals. In addition, GMM is a much richer prior than the previously considered alternatives such as conjugate priors which may not represent the distribution of the gains in the best possible way. In the separation stage after observing the mixed signal, we use the proposed regularized cost function with a combined basis and the GMM priors for all sources that were learned from training data for each source. Only the gain vectors are estimated from the mixed data by minimizing the joint cost function. We introduce novel update rules that solve the optimization problem efficiently for the new regularized NMF problem. This optimization is challenging due to using energy normalization and GMM for prior modeling, which makes the problem highly nonlinear and non-convex. The experimental results show that the introduced methods improve the performance of single channel source separation for speech separation and speech-music separation with different NMF divergence functions. The experimental results also show that, using the GMM prior gives better separation results than using the conjugate prior.

[1]  David J. C. MacKay,et al.  Comparison of Approximate Methods for Handling Hyperparameters , 1999, Neural Computation.

[2]  Tuomas Virtanen,et al.  Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine , 2005, 2005 13th European Signal Processing Conference.

[3]  Emmanuel Vincent,et al.  Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Hakan Erdogan,et al.  Single channel speech-music separation using matching pursuit and spectral masks , 2011, 2011 IEEE 19th Signal Processing and Communications Applications Conference (SIU).

[5]  Perry R. Cook,et al.  Bayesian Nonparametric Matrix Factorization for Recorded Music , 2010, ICML.

[6]  Richard M. Dansereau,et al.  Scaled factorial hidden Markov models: A new technique for compensating gain differences in model-based single channel speech separation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Ali Taylan Cemgil,et al.  Bayesian Inference for Nonnegative Matrix Factorisation Models , 2009, Comput. Intell. Neurosci..

[8]  Hakan Erdogan,et al.  Audio-visual speech recognition with background music using single-channel source separation , 2012, 2012 20th Signal Processing and Communications Applications Conference (SIU).

[9]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[10]  Mikkel N. Schmidt,et al.  Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.

[11]  Alexey Ozerov,et al.  Multichannel nonnegative matrix factorization in convolutive mixtures. With application to blind audio source separation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Roger K. Moore Computer Speech and Language , 1986 .

[13]  Bhiksha Raj,et al.  Phoneme-Dependent NMF for Speech Enhancement in Monaural Mixtures , 2011, INTERSPEECH.

[14]  Hakan Erdogan,et al.  Single Channel Speech Music Separation Using Nonnegative Matrix Factorization with Sliding Windows and Spectral Masks , 2011, INTERSPEECH.

[15]  Emad M. Grais,et al.  Single channel speech music separation using nonnegative matrix factorization and spectral masks , 2011, 2011 17th International Conference on Digital Signal Processing (DSP).

[16]  Bhiksha Raj,et al.  Non-negative matrix factorization based compensation of music for automatic speech recognition , 2010, INTERSPEECH.

[17]  Bhiksha Raj,et al.  Regularized non-negative matrix factorization with temporal dependencies for speech denoising , 2008, INTERSPEECH.

[18]  Hermann Ney,et al.  Using posterior word probabilities for improved speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[19]  Seiichi Nakagawa,et al.  Speech Recognition in Mixed Sound of Speech and Music Based on Vector Quantization and Non-Negative Matrix Factorization , 2011, INTERSPEECH.

[20]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[21]  Pando G. Georgiev,et al.  Blind Source Separation Algorithms with Matrix Constraints , 2003, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[22]  Simon J. Godsill,et al.  Bayesian extensions to non-negative matrix factorisation for audio signal modelling , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[24]  Andrzej Cichocki,et al.  New Algorithms for Non-Negative Matrix Factorization in Applications to Blind Source Separation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[25]  Andrzej Cichocki,et al.  Constrained non-Negative Matrix Factorization Method for EEG Analysis in Early Detection of Alzheimer Disease , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[26]  Ali Taylan Cemgil,et al.  Conjugate Gamma Markov Random Fields for Modelling Nonstationary Sources , 2007, ICA.

[27]  Mark D. Plumbley,et al.  INVESTIGATING SINGLE-CHANNEL AUDIO SOURCE SEPARATION METHODS BASED ON NON-NEGATIVE MATRIX FACTORIZATION , 2006 .

[28]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Ali Taylan Cemgil,et al.  Mixtures of Gamma Priors for Non-negative Matrix Factorization Based Speech Separation , 2009, ICA.

[30]  Emmanuel Vincent,et al.  Fast bayesian nmf algorithms enforcing harmonicity and temporal continuity in polyphonic music transcription , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[31]  Hakan Erdogan,et al.  Adaptation of Speaker-Specific Bases in Non-Negative Matrix Factorization for Single Channel Speech-Music Separation , 2011, INTERSPEECH.

[32]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[33]  Bhiksha Raj,et al.  Speech denoising using nonnegative matrix factorization with priors , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[34]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[35]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[36]  John F. Canny,et al.  GaP: a factor model for discrete data , 2004, SIGIR '04.

[37]  Xabier Jaureguiberry,et al.  Adaptation of source-specific dictionaries in Non-Negative Matrix Factorization for source separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .