Discriminative nonnegative dictionary learning using cross-coherence penalties for single channel source separation

In this work, we introduce a new discriminative training method for nonnegative dictionary learning. The new method can be used in single channel source separation (SCSS) applications. In SCSS, nonnegative matrix factorization (NMF) is used to learn a dictionary (a set of basis vectors) for each source in the magnitude spectrum domain. The trained dictionaries are then used in decomposing the mixed signal to find the estimate for each source. Learning discriminative dictionaries for the source signals can improve the separation performance. To achieve discriminative dictionaries, we try to avoid the bases set of one source dictionary from representing the other source signals. We propose to minimize cross-coherence between the dictionaries of all sources in the mixed signal. We incorporate a simplified cross-coherence penalty using a regularized NMF cost function to simultaneously learn discriminative and reconstructive dictionaries. The new regularized NMF update rules that are used to discriminatively train the dictionaries are introduced in this work. Experimental results show that using discriminative training gives better separation results than using conventional NMF. Copyright © 2013 ISCA.

[1]  Emad M. Grais,et al.  Single channel speech music separation using nonnegative matrix factorization and spectral masks , 2011, 2011 17th International Conference on Digital Signal Processing (DSP).

[2]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Hakan Erdogan,et al.  Gaussian Mixture Gain Priors for Regularized Nonnegative Matrix Factorization in Single-Channel Source Separation , 2012, INTERSPEECH.

[5]  Hakan Erdogan,et al.  Spectro-temporal post-smoothing in NMF based single-channel source separation , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[6]  Pascal Vincent,et al.  Discriminative Non-negative Matrix Factorization for Multiple Pitch Estimation , 2012, ISMIR.

[7]  Hakan Erdogan,et al.  Audio-visual speech recognition with background music using single-channel source separation , 2012, 2012 20th Signal Processing and Communications Applications Conference (SIU).

[8]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[9]  Hakan Erdogan,et al.  Regularized nonnegative matrix factorization using Gaussian mixture priors for supervised single channel source separation , 2013, Comput. Speech Lang..

[10]  Mikkel N. Schmidt,et al.  Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.

[11]  Hakan Erdogan,et al.  Hidden Markov Models as Priors for Regularized Nonnegative Matrix Factorization in Single-Channel Source Separation , 2012, INTERSPEECH.

[12]  Hakan Erdogan,et al.  Adaptation of Speaker-Specific Bases in Non-Negative Matrix Factorization for Single Channel Speech-Music Separation , 2011, INTERSPEECH.

[13]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[14]  P. Frossard,et al.  Tree-Based Pursuit: Algorithm and Properties , 2006, IEEE Transactions on Signal Processing.

[15]  Emmanuel Vincent,et al.  Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Hakan Erdogan,et al.  Single channel speech-music separation using matching pursuit and spectral masks , 2011, 2011 IEEE 19th Signal Processing and Communications Applications Conference (SIU).

[17]  Pierre Vandergheynst,et al.  Dictionary Preconditioning for Greedy Algorithms , 2008, IEEE Transactions on Signal Processing.

[18]  Guillermo Sapiro,et al.  Discriminative learned dictionaries for local image analysis , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Hakan Erdogan,et al.  Single Channel Speech Music Separation Using Nonnegative Matrix Factorization with Sliding Windows and Spectral Masks , 2011, INTERSPEECH.