Monaural Music Source Separation: Nonnegativity, Sparseness, and Shift-Invariance

In this paper we present a method for polyphonic music source separation from their monaural mixture, where the underlying assumption is that the harmonic structure of a musical instrument remains roughly the same even if it is played at various pitches and is recorded in various mixing environments. We incorporate with nonnegativity, shift-invariance, and sparseness to select representative spectral basis vectors that are used to restore music sources from their monaural mixture. Experimental results with monaural instantaneous mixture of voice/cello and monaural convolutive mixture of saxophone/viola, are shown to confirm the validity of our proposed method.

[1]  Derry Fitzgerald,et al.  GENERALISED PRIOR SUBSPACE ANALYSIS FOR POLYPHONIC PITCH TRANSCRIPTION , 2005 .

[2]  Tuomas Virtanen,et al.  Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine , 2005, 2005 13th European Signal Processing Conference.

[3]  J. Eggert,et al.  Transformation-invariant representation and NMF , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[4]  Minje Kim,et al.  On Spectral Basis Selection for Single Channel Polyphonic Music Separation , 2005, ICANN.

[5]  Mark D. Plumbley,et al.  Polyphonic music transcription by non-negative sparse coding of power spectra , 2004 .

[6]  Paris Smaragdis,et al.  Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs , 2004, ICA.

[7]  Mark D. Plumbley,et al.  Automatic Music Transcription and Audio Source Separation , 2002, Cybern. Syst..

[8]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[9]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[10]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[11]  Tomohiro Nakatani,et al.  Blind dereverberation of single channel speech signal based on harmonic structure , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12]  Seungjin Choi,et al.  Nonnegative features of spectro-temporal sounds for classification , 2005, Pattern Recognit. Lett..