Partial clustering using a time-varying frequency model for singing voice detection

We propose a new method to group partials produced by each instrument of a polyphonic audio mixture. This method works for pitched and harmonic instruments and is specially adapted to singing voice. In our approach, we model time-varying frequencies of partials as a slowly varying frequency plus a sinusoidal modulation. The parameters obtained with this model plus some common Auditory Scene Analysis principles are used to define a similarity measure between partials. This multi-criterion based measure is then used to build the input similarity matrix of a clustering algorithm. Clusters obtained are groups of harmonically related partials. We evaluate the ability of our method to group partials per source when one of the sources is a singing voice. We show that partial clustering is a promising approach for singing voice detection and separation.

[1]  Avery Li-Chun Wang,et al.  Instantaneous and frequency-warped signal processing techniques for auditory source separation , 1994 .

[2]  J. Beauchamp,et al.  An investigation of vocal vibrato for synthesis , 1990 .

[3]  David K. Mellinger,et al.  Event formation and separation in musical sound , 1992 .

[4]  Mohan S. Kankanhalli,et al.  Harmonicity and dynamics based audio separation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  Axel Röbel,et al.  Signal decomposition by means of classification of spectral peaks , 2004, ICMC.

[6]  Q. Summerfield Book Review: Auditory Scene Analysis: The Perceptual Organization of Sound , 1992 .

[7]  M. Stone The Opinion Pool , 1961 .

[8]  DeLiang Wang,et al.  Separation of Singing Voice From Music Accompaniment for Monaural Recordings , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Geoffroy Peeters,et al.  Singing voice detection in music tracks using direct voice vibrato detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.