论文信息 - On sparse and low-rank matrix decomposition for singing voice separation

On sparse and low-rank matrix decomposition for singing voice separation

Over recent years there has been a growing interest in finding ways to transform signals/matrices into sparse or low-rank representations, i.e., representations which are sparse in support or of low redundancy. Such decompositions are proving to be particularly powerful for a variety of signal processing and compression problems. In this paper, we investigate the application of this technique to the challenging task of singing voice/accompaniment separation for popular music. The vocal part is modeled as a sparse signal, whereas the instrumental part is considered to be low-rank. In addition, to better account for the particular properties of music, two new algorithms are proposed to improve the decomposition, including the incorporation of harmonicity priors and a back-end drum removal procedure. Evaluations on the MIR-1K benchmark dataset show that the proposed algorithms outperform the state-of-the-art by 0.01-2.41 db.

Yi-Hsuan Yang | Yi-Hsuan Yang

[1] Emmanuel Vincent,et al. A General Flexible Framework for the Handling of Prior Information in Audio Source Separation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[2] Geoffroy Peeters,et al. Singing voice detection in music tracks using direct voice vibrato detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3] Dong Liu,et al. Image retagging , 2010, ACM Multimedia.

[4] Petri Toiviainen,et al. MIR in Matlab (II): A Toolbox for Musical Feature Extraction from Audio , 2007, ISMIR.

[5] Guy J. Brown,et al. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[6] Paris Smaragdis,et al. Singing-voice separation from monaural recordings using robust principal component analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7] Homer H. Chen,et al. Music Emotion Recognition , 2011 .

[8] Guillermo Sapiro,et al. Sparse Representation for Computer Vision and Pattern Recognition , 2010, Proceedings of the IEEE.

[9] Emmanuel J. Candès,et al. A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[10] Jyh-Shing Roger Jang,et al. On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[11] Shuicheng Yan,et al. Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007 .

[12] Yi Ma,et al. Robust principal component analysis? , 2009, JACM.

[13] Logan Volkers,et al. PHASE VOCODER , 2008 .

[14] Arvind Ganesh,et al. Fast Convex Optimization Algorithms for Exact Recovery of a Corrupted Low-Rank Matrix , 2009 .

[15] Zhijian Ou,et al. Combining HMM-based melody extraction and NMF-based soft masking for separating voice and accompaniment from monaural audio , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16] Shuicheng Yan,et al. Graph embedding: a general framework for dimensionality reduction , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[18] Masataka Goto,et al. RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[19] Bryan Pardo,et al. A simple music/voice separation method based on the extraction of the repeating musical structure , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20] Gaël Richard,et al. A Musically Motivated Mid-Level Representation for Pitch Estimation and Musical Audio Source Separation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[21] Hendrik Purwins,et al. Sparse Approximations for Drum Sound Classification , 2011, IEEE Journal of Selected Topics in Signal Processing.

[22] Yi-Hsuan Yang,et al. Music retagging using label propagation and robust principal component analysis , 2012, WWW.

[23] Changshui Zhang,et al. Unsupervised Single-Channel Music Source Separation by Average Harmonic Structure Modeling , 2008, IEEE Transactions on Audio, Speech, and Language Processing.