Methods for singing voice identification using energy coefficients as features

This paper describes two energy representations of the voice signal and tests their efficiency in singing voice identification. The first set of energy features consists in the Mel-scale energies of 14 frequency bands, covering the whole frequency spectrum of the signal. The second energy representation is obtained by wavelet decomposition of the voice signal. The wavelet and scaling filters for the decomposition are derived from fractional B-spline functions. The wavelet decomposition is done hierarchically, into 14 bands, with octave-band filters, taking into account the specific frequencies of the formants. Both energy representations are tested for singing voice identification on the training set and on unknown data

[1]  H. Helmholtz,et al.  On the Sensations of Tone as a Physiological Basis for the Theory of Music , 2005 .

[2]  H. Helmholtz,et al.  Book Reviews: On the Sensations of Tone as a Physiological Basis for the Theory of Music , 1954 .

[3]  Huisheng Chi,et al.  Some key factors in speaker recognition using neural networks approach , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[4]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[5]  Fumitada Itakura,et al.  Text-dependent speaker recognition using the information in the higher frequency band , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Lionel Tarassenko,et al.  Text-independent speaker recognition using neural network techniques , 1995 .

[7]  S. Mallat A wavelet tour of signal processing , 1998 .

[8]  J C Brown Computer identification of musical instruments using pattern recognition with cepstral coefficients as features. , 1999, The Journal of the Acoustical Society of America.

[9]  Karl Kristoffer Jensen,et al.  Timbre models of musical sounds - from the model of one sound to the model of one instrument , 1999, Technical report / University of Copenhagen / Datalogisk institut.

[10]  Michael Unser,et al.  Splines: a perfect fit for signal and image processing , 1999, IEEE Signal Process. Mag..

[11]  Thierry Blu,et al.  Fractional Splines and Wavelets , 2000, SIAM Rev..

[12]  Anssi Klapuri,et al.  Musical instrument recognition using cepstral coefficients and temporal features , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[13]  Andrzej Czyzewski,et al.  Representing Musical Instrument Sounds for Their Automatic Classification , 2001 .

[14]  Nathalie Heinrich Bernardoni Etude de la source glottique en voix parlée et chantée : modélisation et estimation, mesures acoustiques et électroglottographiques, perception , 2001 .

[15]  Corneliu Rusu,et al.  Singing voice features by time-frequency representations , 2003, 3rd International Symposium on Image and Signal Processing and Analysis, 2003. ISPA 2003. Proceedings of the.

[16]  Youngmoo E. Kim Singing voice analysis/synthesis , 2003 .

[17]  Johan Sundberg,et al.  Research on the singing voice in retrospect , 2003 .

[18]  Joe Wolfe,et al.  Vocal tract resonances in singing: the soprano voice. , 2004, The Journal of the Acoustical Society of America.

[19]  M.G. Simoes,et al.  Text independent automatic speaker recognition using selforganizing maps , 2004, Conference Record of the 2004 IEEE Industry Applications Conference, 2004. 39th IAS Annual Meeting..

[20]  M. Sayadi,et al.  Text independent speaker recognition using the Mel frequency cepstral coefficients and a neural network classifier , 2004, First International Symposium on Control, Communications and Signal Processing, 2004..

[21]  Jaakko Astola,et al.  The Mel-Frequency Cepstral Coefficients in the Context of Singer Identification , 2005, ISMIR.