Recognizing human emotion from audiovisual information

In this paper, we present an emotion recognition system to classify human emotional state from audiovisual signals. We extract prosodic, mel-frequency cepstral coefficient (MFCC), and formant frequency features to represent the audio characteristics of the emotional speech. A face detection scheme, based on the HSV color model, is used to detect the face from the background. The facial expressions are represented by Gabor wavelet features. We perform feature selection by using a stepwise method based on Mahalanobis distance. A classification scheme involving the analysis of individual class and combinations of different classes is proposed. Our emotion recognition system is tested over a language and race independent database, and an overall recognition accuracy of 82.14% is achieved.

[1]  L. de Silva,et al.  Facial emotion recognition using multi-modal information , 1997, Proceedings of ICICS, 1997 International Conference on Information, Communications and Signal Processing. Theme: Trends in Information Systems Engineering and Wireless Multimedia Communications (Cat..

[2]  Ling Guan,et al.  A neural network approach for human emotion recognition in speech , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[3]  Jiucang Hao,et al.  Emotion recognition by speech signals , 2003, INTERSPEECH.

[4]  Michael J. Lyons,et al.  Classifying facial attributes using a 2-D Gabor wavelet representation and discriminant analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[5]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[7]  Georgios Tziritas,et al.  Face Detection Using Quantized Skin Color Regions Merging and Wavelet Packet Analysis , 1999, IEEE Trans. Multim..