Singing Voice Separation Using RPCA with Weighted l_1 -norm

In this paper, we present an extension of robust principal component analysis (RPCA) with weighted \(l_{1}\)-norm minimization for singing voice separation. While the conventional RPCA applies a uniform weight between the low-rank and sparse matrices, we use different weighting parameters for each frequency bin in a spectrogram by estimating the variance ratio between the singing voice and accompaniment. In addition, we incorporate the results of vocal activation detection into the formation of the weighting matrix, and use it in the final decomposition framework. From the experimental results using the DSD100 dataset, we found that proposed algorithm yields a meaningful improvement in the separation performance compared to the conventional RPCA.

[1]  Gerhard Widmer,et al.  Monaural Blind Source Separation in the Context of Vocal Detection , 2015, ISMIR.

[2]  Gerhard Widmer,et al.  On the reduction of false positives in singing voice detection , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Fabian J. Theis,et al.  The signal separation evaluation campaign (2007-2010): Achievements and remaining challenges , 2012, Signal Process..

[4]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[5]  Bryan Pardo,et al.  REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  DeLiang Wang,et al.  Separation of Singing Voice From Music Accompaniment for Monaural Recordings , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Kyogu Lee,et al.  Vocal Separation from Monaural Music Using Temporal/Spectral Continuity and Sparsity Constraints , 2014, IEEE Signal Processing Letters.

[8]  Paris Smaragdis,et al.  Singing-voice separation from monaural recordings using robust principal component analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Kyogu Lee,et al.  Vocal separation using extended robust principal component analysis with Schatten p/lp-norm and scale compression , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[10]  Weiyu Xu,et al.  Analyzing Weighted $\ell_1$ Minimization for Sparse Recovery With Nonuniform Sparse Models , 2010, IEEE Transactions on Signal Processing.

[11]  Gerhard Widmer,et al.  Towards Light-Weight, Real-Time-Capable Singing Voice Detection , 2013, ISMIR.

[12]  Guillermo Sapiro,et al.  Real-time Online Singing Voice Separation from Monaural Recordings Using Robust Low-rank Modeling , 2012, ISMIR.

[13]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[15]  Yi-Hsuan Yang,et al.  Vocal activity informed singing voice separation with the iKala dataset , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Shigeki Sagayama,et al.  Singing Voice Enhancement in Monaural Music Signals Based on Two-stage Harmonic/Percussive Sound Separation on Multiple Resolution Spectrograms , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.