Single channel music source separation based on harmonic structure estimation

Single channel music separation is a useful but difficult problem in audio signal processing field. In this paper a new method is proposed. The method consists of three stages: estimating the harmonic structure of each source in every frame based on iteration of the mixed spectral peaks, clustering the estimated harmonics into the signals they belong to with pitch and formant information, and synthesizing the music source in time domain. Moreover, the method can solve the octave overlapping problem which is a tough one in the single channel source separation area. The experimental results show that our algorithm can separate the mixed signal and obtains a good subjective audio quality.

[1]  Anssi Klapuri,et al.  Multiple fundamental frequency estimation based on harmonicity and spectral smoothness , 2003, IEEE Trans. Speech Audio Process..

[2]  Corentin Dubois,et al.  Joint Detection and Tracking of Time-Varying Harmonic Components: A Flexible Bayesian Approach , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Mark A. Clements,et al.  A singing voice synthesis system based on sinusoidal modeling , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  P. Vanroose,et al.  BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION , 2003 .

[5]  Changshui Zhang,et al.  Separation of Voice and Music by Harmonic Structure Stability Analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[6]  Barak A. Pearlmutter,et al.  Blind Source Separation via Multinode Sparse Representation , 2001, NIPS.

[7]  Daniel P. W. Ellis,et al.  Multi-channel source separation by factorial HMMs , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[8]  Rémi Gribonval,et al.  Audio source separation with a single sensor , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[11]  Changshui Zhang,et al.  Unsupervised Single-Channel Music Source Separation by Average Harmonic Structure Modeling , 2008, IEEE Transactions on Audio, Speech, and Language Processing.