Vocal activity informed singing voice separation with the iKala dataset

A new algorithm is proposed for robust principal component analysis with predefined sparsity patterns. The algorithm is then applied to separate the singing voice from the instrumental accompaniment using vocal activity information. To evaluate its performance, we construct a new publicly available iKala dataset that features longer durations and higher quality than the existing MIR-1K dataset for singing voice separation. Part of it will be used in the MIREX Singing Voice Separation task. Experimental results on both the MIR-1K dataset and the new iKala dataset confirmed that the more informed the algorithm is, the better the separation results are.

[1]  Yi-Hsuan Yang,et al.  Singing voice timbre classification of Chinese popular music , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Gautham J. Mysore,et al.  Source Separation of Polyphonic Music with Interactive User-Feedback on a Piano Roll Display , 2013, ISMIR.

[3]  Bryan Pardo,et al.  Music/Voice Separation Using the Similarity Matrix , 2012, ISMIR.

[4]  Andreas Ziehe,et al.  The 2011 Signal Separation Evaluation Campaign (SiSEC2011): - Audio Source Separation - , 2012, LVA/ICA.

[5]  Changshui Zhang,et al.  Unsupervised Single-Channel Music Source Separation by Average Harmonic Structure Modeling , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Yi-Hsuan Yang,et al.  Sparse Modeling for Artist Identification: Exploiting Phase Information and Vocal Separation , 2013, ISMIR.

[7]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[8]  Emilia Gómez,et al.  Using voice suppression algorithms to improve beat tracking in the presence of highly predominant vocals , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Rémi Gribonval,et al.  From Blind to Guided Audio Source Separation: How models and side information can improve the separation of sound , 2014, IEEE Signal Processing Magazine.

[10]  Masataka Goto,et al.  Beyond NMF: Time-Domain Audio Source Separation without Phase Reconstruction , 2013, ISMIR.

[11]  Hiromasa Fujihara,et al.  Cultivating vocal activity detection for music audio signals in a circulation-type crowdsourcing ecosystem , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Yi-Hsuan Yang,et al.  Low-Rank Representation of Both Singing Voice and Music Accompaniment Via Learned Dictionaries , 2013, ISMIR.

[13]  Masataka Goto,et al.  Vocal timbre analysis using latent Dirichlet allocation and cross-gender vocal timbre similarity , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Philippe Depalle,et al.  Phase constrained complex NMF: Separating overlapping partials in mixtures of harmonic musical sources , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  DeLiang Wang,et al.  On Training Targets for Supervised Speech Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[17]  Mark D. Plumbley,et al.  Score-Informed Source Separation for Musical Audio Recordings: An overview , 2014, IEEE Signal Processing Magazine.

[18]  Francis R. Bach,et al.  Semi-supervised NMF with Time-frequency Annotations for Single-channel Source Separation , 2012, ISMIR.

[19]  Shigeki Sagayama,et al.  Singing Voice Enhancement in Monaural Music Signals Based on Two-stage Harmonic/Percussive Sound Separation on Multiple Resolution Spectrograms , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  Bryan Pardo,et al.  Soundprism: An Online System for Score-Informed Source Separation of Music Audio , 2011, IEEE Journal of Selected Topics in Signal Processing.

[21]  Jyh-Shing Roger Jang,et al.  On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Wei-Ho Tsai,et al.  Automatic Evaluation of Karaoke Singing Based on Pitch, Volume, and Rhythm Features , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Antoine Liutkus,et al.  An overview of informed audio source separation , 2013, 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS).

[25]  Bryan Pardo,et al.  Combining Rhythm-Based and Pitch-Based Methods for Background and Melody Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[26]  Paris Smaragdis,et al.  Singing-voice separation from monaural recordings using robust principal component analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Dimitri P. Bertsekas,et al.  Constrained Optimization and Lagrange Multiplier Methods , 1982 .

[28]  Simon Haykin,et al.  Selected topics in signal processing , 1989 .

[29]  Mark D. Plumbley,et al.  Score informed audio source separation using constrained nonnegative matrix factorization and score synthesis , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  Gerhard Widmer,et al.  On the reduction of false positives in singing voice detection , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  DeLiang Wang,et al.  On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.

[32]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.