The Mel-Frequency Cepstral Coefficients in the Context of Singer Identification

The singing voice is the oldest and most complex musical instrument. A familiar singer’s voice is easily recognizable for humans, even when hearing a song for the first time. On the other hand, for automatic identification this is a difficult task among sound source identification applications. The signal processing techniques aim to extract features that are related to identity characteristics. The research presented in this paper considers 32 Mel-Frequency Cepstral Coefficients in two subsets: the low order MFCCs characterizing the vocal tract resonances and the high order MFCCs related to the glottal wave shape. We explore possibilities to identify and discriminate singers using the two sets. Based on the results we can affirm that both subsets have their contribution in defining the identity of the voice, but the high order subset is more robust to changes in singing style.

[1]  M.G. Simoes,et al.  Text independent automatic speaker recognition using selforganizing maps , 2004, Conference Record of the 2004 IEEE Industry Applications Conference, 2004. 39th IAS Annual Meeting..

[2]  M. Sayadi,et al.  Text independent speaker recognition using the Mel frequency cepstral coefficients and a neural network classifier , 2004, First International Symposium on Control, Communications and Signal Processing, 2004..

[3]  Johan Sundberg,et al.  Research on the singing voice in retrospect , 2003 .

[4]  Anssi Klapuri,et al.  Musical instrument recognition using cepstral coefficients and temporal features , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[6]  Jennifer Oates,et al.  Therelationship between professional operatic soprano voice and high range spectral energy. , 2004, The Journal of the Acoustical Society of America.

[7]  Youngmoo E. Kim Singing voice analysis/synthesis , 2003 .

[8]  Lionel Tarassenko,et al.  Text-independent speaker recognition using neural network techniques , 1995 .

[9]  J C Brown Computer identification of musical instruments using pattern recognition with cepstral coefficients as features. , 1999, The Journal of the Acoustical Society of America.

[10]  D L Chadwick,et al.  Music and hearing. , 1973, Proceedings of the Royal Society of Medicine.

[11]  Huisheng Chi,et al.  Some key factors in speaker recognition using neural networks approach , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[12]  Fumitada Itakura,et al.  Text-dependent speaker recognition using the information in the higher frequency band , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.