Single channel speech music separation using nonnegative matrix factorization and spectral masks

A single channel speech-music separation algorithm based on nonnegative matrix factorization (NMF) with spectral masks is proposed in this work. The proposed algorithm uses training data of speech and music signals with nonnegative matrix factorization followed by masking to separate the mixed signal. In the training stage, NMF uses the training data to train a set of basis vectors for each source. These bases are trained using NMF in the magnitude spectrum domain. After observing the mixed signal, NMF is used to decompose its magnitude spectra into a linear combination of the trained bases for both sources. The decomposition results are used to build a mask, which explains the contribution of each source in the mixed signal. Experimental results show that using masks after NMF improves the separation process even when calculating NMF with fewer iterations, which yields a faster separation process.

[1]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[2]  Tuomas Virtanen,et al.  Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine , 2005, 2005 13th European Signal Processing Conference.

[3]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Rémi Gribonval,et al.  Experiments in audio source separation with one sensor for robust speech recognition , 2006, Speech Commun..

[5]  Mark D. Plumbley,et al.  INVESTIGATING SINGLE-CHANNEL AUDIO SOURCE SEPARATION METHODS BASED ON NON-NEGATIVE MATRIX FACTORIZATION , 2006 .

[6]  Mikkel N. Schmidt,et al.  Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.

[7]  Rémi Gribonval,et al.  Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular Songs , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Bhiksha Raj,et al.  Speech denoising using nonnegative matrix factorization with priors , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Hakan Erdogan,et al.  Semi-blind Speech-Music Separation Using Sparsity and Continuity Priors , 2010, 2010 20th International Conference on Pattern Recognition.

[11]  Hakan Erdogan,et al.  Single channel speech-music separation using matching pursuit and spectral masks , 2011, 2011 IEEE 19th Signal Processing and Communications Applications Conference (SIU).

[12]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.