论文信息 - Emotion Recognition using Mel-Frequency Cepstral Coefficients

Emotion Recognition using Mel-Frequency Cepstral Coefficients

In this paper, we propose a new approach to emotion recognition. Prosodic features are currently used in most emotion recognition algorithms. However, emotion recognition algorithms using prosodic features are not sufficiently accurate. Therefore, we focused on the phonetic features of speech for emotion recognition. In particular, we describe the effectiveness of Mel-frequency Cepstral Coefficients (MFCCs) as the feature for emotion recognition. We focus on the precise classification of MFCC feature vectors, rather than their dynamic nature over an utterance. To realize such an approach, the proposed algorithm employs multi-template emotion classification of the analysis frames. Experimental evaluations show that the proposed algorithm produces 66.4% recognition accuracy in speaker-independent emotion recognition experiments for four specific emotions. This recognition accuracy is higher than the accuracy obtained by the conventional prosody-based and MFCC-based emotion recognition algorithms, which confirms the potential of the proposed algorithm.

Yasunari Obuchi | Nobuo Sato | Y. Obuchi | Nobuo Sato

[1] Robert M. Gray,et al. An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[2] George N. Votsis,et al. Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[3] Jiucang Hao,et al. Emotion recognition by speech signals , 2003, INTERSPEECH.

[4] Oh-Wook Kwon,et al. EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[5] Harry Shum,et al. Emotion Detection from Speech to Enrich Multimedia Content , 2001, IEEE Pacific Rim Conference on Multimedia.

[6] Ralf Kompe,et al. Emotional space improves emotion recognition , 2002, INTERSPEECH.

[7] Steven J. Simske,et al. Recognition of emotions in interactive voice response systems , 2003, INTERSPEECH.

[8] Oudeyer Pierre-Yves,et al. The production and recognition of emotions in speech: features and algorithms , 2003 .

[9] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[10] Tomas Pfister,et al. Emotion Detection from Speech , 2007 .

[11] Pierre-Yves Oudeyer,et al. The production and recognition of emotions in speech: features and algorithms , 2003, Int. J. Hum. Comput. Stud..

[12] Albino Nogueiras,et al. Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[13] Björn W. Schuller,et al. Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles , 2005, INTERSPEECH.

[14] Björn W. Schuller,et al. Hidden Markov model-based speech emotion recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).