On fusion of timbre-motivated features for singing voice detection and singer identification

Timbre is the quality of sound which allows the ear to distinguish between musical sounds. In this paper, we study timbre effects in identification of singing voice segments in popular songs. Firstly, we identify between singing voice and instrumental segments in a song. Then, singing voice segments are further categorized according to their singer identity. Timbre-motivated effects are formulated by fusion of systems that use the features from vibrato, harmonic information and other features extracted using Mel and Log frequency scale filter banks. Statistical methods to select singing voice segments with high confidence measure are proposed for better performance in singer identification process. The experiments conducted on a database of 214 popular songs show that the proposed approach is effective.

[1]  Paolo Prandoni,et al.  Sonological models for timbre characterization , 1997 .

[2]  Ye Wang,et al.  Automatic Detection Of Vocal Segments In Popular Songs , 2004, ISMIR.

[3]  Jean-François Bonastre,et al.  Bayesian bpproach based decision in speaker verification , 2001, Odyssey.

[4]  Daniel P. W. Ellis,et al.  USING VOICE SEGMENTS TO IMPROVE ARTIST CLASSIFICATION OF MUSIC , 2002 .

[5]  Ming Chun. Liu,et al.  Content-based audio classification and retrieval. , 2005 .

[6]  Changsheng Xu,et al.  Singer identification based on vocal and instrumental models , 2004, ICPR 2004.

[7]  Fritz Winckel Music, Sound and Sensation: A Modern Exposition , 1967 .

[8]  M. Mellody,et al.  Modal distribution analysis, synthesis, and perception of a soprano's sung vowels. , 2001, Journal of voice : official journal of the Voice Foundation.

[9]  T. Zhang System and Method for Automatic Singer Identification , 2003 .

[10]  Tong Zhang,et al.  Content-Based Audio Classification and Retrieval for Audiovisual Data Parsing , 2001 .

[11]  Haizhou Li,et al.  Exploring Vibrato-Motivated Acoustic Features for Singer Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  George Tzunetukis SONG-SPECIFIC BOOTSTRAPPING OF SINGING VOICE STRUCTURE , 2004 .

[13]  P. Desain,et al.  VIBRATO : QUESTIONS AND ANSWERS FROM MUSICIANS AND SCIENCE , 2000 .

[14]  J. Sundberg,et al.  The Science of Singing Voice , 1987 .

[15]  Steve Lawrence,et al.  Artist detection in music with Minnowmatch , 2001, Neural Networks for Signal Processing XI: Proceedings of the 2001 IEEE Signal Processing Society Workshop (IEEE Cat. No.01TH8584).

[16]  Joe Wolfe,et al.  Vocal tract resonances in singing: the soprano voice. , 2004, The Journal of the Acoustical Society of America.

[17]  F. Alton Everest,et al.  Master Handbook of Acoustics, Fourth Edition , 2001 .

[18]  Gregory H. Wakefield,et al.  Singing voice identification using spectral envelope estimation , 2004, IEEE Transactions on Speech and Audio Processing.

[19]  Masataka Goto,et al.  A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..

[20]  C. Dromey,et al.  Vibrato rate adjustment. , 2003, Journal of voice : official journal of the Voice Foundation.

[21]  Rebecca B. MacLeod,et al.  Influences of Dynamic Level and Pitch Register on the Vibrato Rates and Widths of Violin and Viola Players , 2008 .