Comparative study of singing voice detection methods

Detecting Singing segments in a segment of a soundtrack is an important and useful technique in musical signal processing and retrieval. In this paper, we study the accuracy of detecting singing segments using the HMM (Hidden Markov Model) classifier with various features, including MFCC (Mel Frequency Cepstral Coefficients), LPCC (Linear Predictive Cepstral Coefficients), and LPC (Linear Prediction Coefficients). Simulation results show that detecting singing segments in a soundtrack is more difficult than detecting them among pure-instrument segments. In addition, combining MFCC and LPCC yield higher accuracy. The bootstrapping technique has only limited accuracy improvement to detect all singing segments in a soundtrack. To be complete, we also conduct an experiment to show that the time to perform music identification can be reduced by more than 40 % if we incorporate the singing-voice detection mechanism into the identification process.

[1]  Christian Dittmar,et al.  EFFECTIVE SINGING VOICE DETECTION IN POPULAR MUSIC USING ARMA FILTERING , 2007 .

[2]  Shingchern D. You,et al.  Comparative study of methods for reducing dimensionality of MPEG-7 audio signature descriptors , 2013, Multimedia Tools and Applications.

[3]  Daniel P. W. Ellis,et al.  Locating singing voice segments within music signals , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[4]  Shankar Vembu,et al.  Separation of Vocals from Polyphonic Audio Recordings , 2005, ISMIR.

[5]  Shingchern D. You,et al.  Music Identification System Using MPEG-7 Audio Signature Descriptors , 2013, TheScientificWorldJournal.

[6]  Ye Wang,et al.  Singing voice detection in popular music , 2004, MULTIMEDIA '04.

[7]  M. Posner Human information processing: An introduction to psychology. 2nd ed. , 1977 .

[8]  Douglas D. O'Shaughnessy,et al.  Speech communication : human and machine , 1987 .

[9]  Perfecto Herrera,et al.  Comparing audio descriptors for singing voice detection in music audio files , 2007 .

[10]  Pedro Cano,et al.  A Review of Audio Fingerprinting , 2005, J. VLSI Signal Process..

[11]  George Tzunetukis SONG-SPECIFIC BOOTSTRAPPING OF SINGING VOICE STRUCTURE , 2004 .

[12]  P. H. Lindsay,et al.  Human Information Processing: An Introduction to Psychology , 1972 .

[13]  Shingchern D. You,et al.  Using Paired Distances of Signal Peaks in Stereo Channels as Fingerprints for Copy Identification , 2015, TOMM.

[14]  Michael A. Casey Reduced-Rank Spectra and Minimum Entropy Priors for Generalized Sound Recognition , 2001 .

[15]  Seok-Hwan Yoon,et al.  An Intelligent Automatic Early Detection System of Forest Fire Smoke Signatures using Gaussian Mixture Model , 2013, J. Inf. Process. Syst..

[16]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[17]  Claudio Becchetti,et al.  Speech Recognition: Theory and C++ Implementation , 1999 .

[18]  M. Casey,et al.  MPEG-7 sound-recognition tools , 2001, IEEE Trans. Circuits Syst. Video Technol..