Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria

An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a time-varying gain. Each sound source, in turn, is modeled as a sum of one or more components. The parameters of the components are estimated by minimizing the reconstruction error between the input spectrogram and the model, while restricting the component spectrograms to be nonnegative and favoring components whose gains are slowly varying and sparse. Temporal continuity is favored by using a cost term which is the sum of squared differences between the gains in adjacent frames, and sparseness is favored by penalizing nonzero gains. The proposed iterative estimation algorithm is initialized with random values, and the gains and the spectra are then alternatively updated using multiplicative update rules until the values converge. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and drum sounds. The performance of the proposed method was compared with independent subspace analysis and basic nonnegative matrix factorization, which are based on the same linear model. According to these simulations, the proposed method enables a better separation quality than the previous algorithms. Especially, the temporal continuity criterion improved the detection of pitched musical sounds. The sparseness criterion did not produce significant improvements

[1]  Aapo Hyvärinen,et al.  Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[2]  F. Downton,et al.  Introduction to Mathematical Statistics , 1959 .

[3]  Derry Fitzgerald,et al.  Automatic Drum Transcription and Source Separation , 2004 .

[4]  Tuomas Virtanen,et al.  Separation of sound sources by convolutive sparse coding , 2004, SAPA@INTERSPEECH.

[5]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  DeLiang Wang,et al.  Monaural speech segregation based on pitch tracking and amplitude modulation , 2002, IEEE Transactions on Neural Networks.

[7]  Te-Won Lee,et al.  A Maximum Likelihood Approach to Single-channel Source Separation , 2003, J. Mach. Learn. Res..

[8]  Masataka Goto A Predominant-F0 Estimation Method for Real-world Musical Audio Signals: MAP Estimation for Incorporating Prior Knowledge about F0s and Tone Models , 2001 .

[9]  Christian Uhle,et al.  EXTRACTION OF DRUM TRACKS FROM POLYPHONIC MUSIC USING INDEPENDENT SUBSPACE ANALYSIS , 2003 .

[10]  Rémi Gribonval,et al.  Non negative sparse representation for Wiener based source separation with a single sensor , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[12]  A Lewis,et al.  THE SCIENCE OF SOUND , 1997 .

[13]  G. Reinsel,et al.  Introduction to Mathematical Statistics (4th ed.). , 1980 .

[14]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[15]  Derry Fitzgerald,et al.  SUB-BAND INDEPENDENT SUBSPACE ANALYSIS FOR DRUM TRANSCRIPTION , 2002 .

[16]  Shankar Vembu,et al.  Separation of Vocals from Polyphonic Audio Recordings , 2005, ISMIR.

[17]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[18]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[19]  Mark D. Plumbley,et al.  Polyphonic transcription by non-negative sparse coding of power spectra , 2004, ISMIR.

[20]  Richard F. Lyon,et al.  Auditory model inversion for sound separation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Xavier Rodet,et al.  Music Transcription with ISA and HMM , 2004, ICA.

[22]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[23]  Orife Riddim A rhythm analysis and decomposition tool based on independent subspace analysis , 2001 .

[24]  Mark D. Plumbley,et al.  An Independent Component Analysis Approach to Automatic Music Transcription , 2003 .

[25]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[26]  Tuomas Virtanen,et al.  Sound Source Separation Using Sparse Coding with Temporal Continuity Objective , 2003, ICMC.

[27]  Michael A. Casey,et al.  Separation of Mixed Audio Sources By Independent Subspace Analysis , 2000, ICMC.

[28]  Shlomo Dubnov Extracting Sound Objects by Independent Subspace Analysis , 2002 .

[29]  P. Smaragdis,et al.  Independent component analysis for automatic note extraction from musical trills. , 2004, The Journal of the Acoustical Society of America.

[30]  M. Casey,et al.  MPEG-7 sound-recognition tools , 2001, IEEE Trans. Circuits Syst. Video Technol..

[31]  VirtanenTuomas Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007 .

[32]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[33]  Mike E. Davies,et al.  Sparse and shift-Invariant representations of music , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Paris Smaragdis,et al.  Mitsubishi Electric Research Laboratories , 1994 .

[35]  Sam T. Roweis,et al.  One Microphone Source Separation , 2000, NIPS.

[36]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[37]  Tuomas Virtanen,et al.  Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine , 2005, 2005 13th European Signal Processing Conference.