论文信息 - Recognizing emotions for the audio-visual document indexing

Recognizing emotions for the audio-visual document indexing

In this paper, we proposed using MFCC coefficients (mel-scaled cepstral coefficients) and a simple but efficient classifying method: vector quantification (VQ) to perform speaker-dependent emotion recognition. Many other features: energy, pitch, zero crossing, phonetic rate, LPC... and their derivatives are also tested and combined with MFCC coefficients in order to find the best combination. Other models, GMM and HMM (discrete and continuous hidden Markov model), are studied as well in the hope that the use of continuous distribution and the temporal evolution of this set of features will improve the quality of emotion recognition. The accuracy recognizing five different emotions exceeds 80% by using only MFCC coefficients with VQ model. This is a simple but efficient approach, the result is even much better than those obtained with the same database in human evaluations by listening and judging without returning permission nor comparisons between sentences (Inger Samso Engberg and Anya Varnich Hansen, 2001).

Georges Quénot | See-May Phoong | Eric Castelli

[1] Liyanage C De Silva. Speech Based Emotion Classification Tin Lay Nwe, Student Member, IEEE, Foo Say Wei, Senior Member, BEE , 2001 .

[2] Valery A. Petrushin,et al. EMOTION IN SPEECH: RECOGNITION AND APPLICATION TO CALL CENTERS , 1999 .

[3] Albino Nogueiras,et al. Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[4] L.C. De Silva,et al. Speech based emotion classification , 2001, Proceedings of IEEE Region 10 International Conference on Electrical and Electronic Technology. TENCON 2001 (Cat. No.01CH37239).

[5] Roddy Cowie,et al. Automatic recognition of emotion from voice: a rough benchmark , 2000 .