Singing voice detection using twice-iterated composite Fourier transform

In this paper, we propose a twice-iterated composite Fourier transform (TICFT) technique to detect the singing voice boundaries from acoustical polyphonic music signals. We show that the cumulative TICFT energy in the lower coefficients is capable of differentiating the harmonic structures of vocal and instrumental music in higher octaves. The musical signal is first segmented into frames based on quarter-notes. Then TICFT is used to measure the harmonic structure of each frame. Finally, the vocal and instrumental frames are classified by applying music domain knowledge. Experimental results show over 80% frame level accuracy can be achieved

[1]  Yoichi Muraoka,et al.  A beat tracking system for acoustic signals of music , 1994, MULTIMEDIA '94.

[2]  Eric D. Scheirer,et al.  Tempo and beat analysis of acoustic musical signals. , 1998, The Journal of the Acoustical Society of America.

[3]  Youngmoo E. Kim,et al.  Singer Identification in Popular Music Recordings Using Voice Coding Features , 2002 .

[4]  Tong Zhang,et al.  Automatic singer identification , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[5]  J. Sundberg,et al.  The Science of Singing Voice , 1987 .

[6]  M. Davies,et al.  A HYBRID APPROACH TO MUSICAL NOTE ONSET DETECTION , 2002 .

[7]  Masataka Goto,et al.  An Audio-based Real-time Beat Tracking System for Music With or Without Drum-sounds , 2001 .

[8]  Lie Lu,et al.  Automated extraction of music snippets , 2003, ACM Multimedia.

[9]  Daniel P. W. Ellis,et al.  Locating singing voice segments within music signals , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[10]  A Lewis,et al.  THE SCIENCE OF SOUND , 1997 .