On the reduction of false positives in singing voice detection

Motivated by the observation that one of the biggest problems in automatic singing voice detection is the confusion of vocals with other pitch-continuous and pitch-varying instruments, we propose a set of three new audio features designed to reduce the amount of false vocal detections. This is borne out in comparative experiments with three different musical corpora. The resulting singing voice detector appears to be at least on par with more complex state-of-the-art methods. New features and classifier are very light-weight and in principle suitable for on-line use.

[1]  Gerhard Widmer,et al.  A SIMPLE AND EFFECTIVE SPECTRAL FEATURE FOR SPEECH DETECTION IN MIXED AUDIO SIGNALS , 2012 .

[2]  William A. Sethares,et al.  Rhythm and Transforms , 2007 .

[3]  Gerhard Widmer,et al.  Towards Light-Weight, Real-Time-Capable Singing Voice Detection , 2013, ISMIR.

[4]  A. Gray,et al.  A spectral-flatness measure for studying the autocorrelation method of linear prediction of speech analysis , 1974 .

[5]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[6]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[7]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[8]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[9]  Perfecto Herrera,et al.  Comparing audio descriptors for singing voice detection in music audio files , 2007 .

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[12]  G. Peeters Automatic Classification of Large Musical Instrument Databases Using Hierarchical Classifiers with Inertia Ratio Maximization , 2003 .

[13]  Gaël Richard,et al.  Vocal detection in music with support vector machines , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Hiromasa Fujihara,et al.  Timbre and Melody Features for the Recognition of Vocal Activity and Instrumental Solos in Polyphonic Music , 2011, ISMIR.

[15]  Shankar Vembu,et al.  Separation of Vocals from Polyphonic Audio Recordings , 2005, ISMIR.

[16]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.