Timbre and Melody Features for the Recognition of Vocal Activity and Instrumental Solos in Polyphonic Music

We propose the task of detecting instrumental solos in polyphonic music recordings, and the usage of a set of four audio features for vocal and instrumental activity detection. Three of the features are based on the prior extraction of the predominant melody line, and have not been used in the context of vocal/instrumental activity detection. Using a support vector machine hidden Markov model we conduct 14 experiments to validate several combinations of our proposed features. Our results clearly demonstrate the benefit of combining the features: the best performance was always achieved by combining all four features. The top accuracy for vocal activity detection is 87.2%. The more difficult task of detecting instrumental solos equally benefits from the combination of all features and achieves an accuracy of 89.8% and a satisfactory precision of 61.1%. With this paper we also release to the public the 102 annotations we used for training and testing. The annotations offer not only vocal/nonvocal labels, but also distinguish between female and male singers, and different solo instruments.

[1]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[2]  Daniel P. W. Ellis,et al.  Locating singing voice segments within music signals , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[3]  Hiromasa Fujihara,et al.  Automatic Synchronization between Lyrics and Music CD Recordings Based on Viterbi Alignment of Segregated Vocal Signals , 2006, Eighth IEEE International Symposium on Multimedia (ISM'06).

[4]  Mark Sandler,et al.  Automatic Chord Identifcation using a Quantised Chromagram , 2005 .

[5]  Geoffroy Peeters,et al.  Singing voice detection in music tracks using direct voice vibrato detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[7]  Ronald W. Schafer,et al.  Introduction to Digital Speech Processing , 2007, Found. Trends Signal Process..

[8]  M. Hunt,et al.  Distance measures for speech recognition , 1989 .

[9]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[10]  Christian Dittmar,et al.  EFFECTIVE SINGING VOICE DETECTION IN POPULAR MUSIC USING ARMA FILTERING , 2007 .

[11]  Matthias Mauch,et al.  Automatic chord transcription from audio using computational models of musical context , 2010 .

[12]  Changsheng Xu,et al.  Singing voice detection using twice-iterated composite Fourier transform , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[13]  Haizhou Li,et al.  On fusion of timbre-motivated features for singing voice detection and singer identification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Masataka Goto,et al.  A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..