Multimodal human emotion/expression recognition

Recognizing human facial expression and emotion by computer is an interesting and challenging problem. Many have investigated emotional contents in speech alone, or recognition of human facial expressions solely from images. However, relatively little has been done in combining these two modalities for recognizing human emotions. L.C. De Silva et al. (1997) studied human subjects' ability to recognize emotions from viewing video clips of facial expressions and listening to the corresponding emotional speech stimuli. They found that humans recognize some emotions better by audio information, and other emotions better by video. They also proposed an algorithm to integrate both kinds of inputs to mimic human's recognition process. While attempting to implement the algorithm, we encountered difficulties which led us to a different approach. We found these two modalities to be complimentary. By using both, we show it is possible to achieve higher recognition rates than either modality alone.

[1]  Tom Johnstone,et al.  Emotional speech elicited using computer games , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[2]  Roddy Cowie,et al.  Automatic statistical analysis of the signal and prosodic signs of emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  P. Ekman Emotion in the human face , 1982 .

[4]  P. Ekman,et al.  Strong evidence for universals in facial expressions: a reply to Russell's mistaken critique. , 1994, Psychological bulletin.

[5]  Jun Ohya,et al.  Recognizing multiple persons' facial expressions using HMM based on automatic extraction of significant frames from image sequences , 1997, Proceedings of International Conference on Image Processing.

[6]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[7]  Kenji Mase,et al.  Recognition of Facial Expression from Optical Flow , 1991 .

[8]  Shigeo Morishima,et al.  Emotion modeling in speech production using emotion space , 1996, Proceedings 5th IEEE International Workshop on Robot and Human Communication. RO-MAN'96 TSUKUBA.

[9]  Larry S. Davis,et al.  Human expression recognition from motion using a radial basis function network architecture , 1996, IEEE Trans. Neural Networks.

[10]  Shigeo Morishima,et al.  Expression analysis/synthesis system based on emotion space constructed by multilayered neural network , 1994 .

[11]  Alex Pentland,et al.  Coding, Analysis, Interpretation, and Recognition of Facial Expressions , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Hiroya Fujisaki,et al.  Prosody, Models, and Spontaneous Speech , 1997, Computing Prosody.

[13]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.