Monaural Blind Source Separation in the Context of Vocal Detection

In this paper, we evaluate the usefulness of several monaural blind source separation (BSS) algorithms in the context of vocal detection (VD). BSS is the problem of recovering several sources, given only a mixture. VD is the problem of automatically identifying the parts in a mixed audio signal, where at least one person is singing. We compare the results of three different strategies for utilising the estimated singing voice signals from four state-of-the-art source separation algorithms. In order to assess the performance of those strategies on an internal data set, we use two different feature sets, each fed to two different classifiers. After selecting the most promising approach, the results on two publicly available data sets are presented. In an additional experiment, we use the improved VD for a simple postprocessing technique: For the final estimation of the source signals, we decide to use either silence, or the mixed, or the separated signals, according to the VD. The results of traditionally used BSS evaluation methods suggest that this is useful for both the estimated background signals, as well as for the estimated vocals.

[1]  Daniel P. W. Ellis,et al.  Locating singing voice segments within music signals , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[2]  Ye Wang,et al.  Singing voice detection in popular music , 2004, MULTIMEDIA '04.

[3]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[4]  Hiromasa Fujihara,et al.  Timbre and Melody Features for the Recognition of Vocal Activity and Instrumental Solos in Polyphonic Music , 2011, ISMIR.

[5]  R. Gribonval,et al.  Proposals for Performance Measurement in Source Separation , 2003 .

[6]  Antoine Liutkus,et al.  Kernel Additive Models for Source Separation , 2014, IEEE Transactions on Signal Processing.

[7]  Bryan Pardo,et al.  A simple music/voice separation method based on the extraction of the repeating musical structure , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Björn W. Schuller,et al.  Combining monaural source separation with Long Short-Term Memory for increased robustness in vocalist gender recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Antoine Liutkus,et al.  Kernel spectrogram models for source separation , 2014, 2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA).

[10]  Gerhard Widmer,et al.  On the reduction of false positives in singing voice detection , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Björn Schuller,et al.  Blind Enhancement of the Rhythmic and Harmonic Sections by NMF: Does it help? , 2009 .

[12]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[13]  Bryan Pardo,et al.  Music/Voice Separation Using the Similarity Matrix , 2012, ISMIR.

[14]  Emmanuel Vincent,et al.  Subjective and Objective Quality Assessment of Audio Source Separation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Gaël Richard,et al.  A Musically Motivated Mid-Level Representation for Pitch Estimation and Musical Audio Source Separation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[16]  Bryan Pardo,et al.  REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Emmanuel Vincent,et al.  A General Flexible Framework for the Handling of Prior Information in Audio Source Separation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Gaël Richard,et al.  Vocal detection in music with support vector machines , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Yi-Hsuan Yang,et al.  Vocal activity informed singing voice separation with the iKala dataset , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Björn W. Schuller,et al.  Automatic Assessment of Singer Traits in Popular Music: Gender, Age, Height and Race , 2011, ISMIR.

[21]  Antoine Liutkus,et al.  Adaptive filtering for music/voice separation exploiting the repeating musical structure , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Paris Smaragdis,et al.  Singing-voice separation from monaural recordings using robust principal component analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).