Emotion Recognition Based on Joint Visual and Audio Cues

Recent technological advances have enabled human users to interact with computers in ways previously unimaginable. Beyond the confines of the keyboard and mouse, new modalities for human-computer interaction such as voice, gesture, and force-feedback are emerging. However, one necessary ingredient for natural interaction is still missing - emotions. This paper describes the problem of bimodal emotion recognition and advocates the use of probabilistic graphical models when fusing the different modalities. We test our audio-visual emotion recognition approach on 38 subjects with 11 HCI-related affect states. The experimental results show that the average person-dependent emotion recognition accuracy is greatly improved when both visual and audio information are used in classification

[1]  Yasunari Yoshitomi,et al.  Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face , 2000, Proceedings 9th IEEE International Workshop on Robot and Human Interactive Communication. IEEE RO-MAN 2000 (Cat. No.00TH8499).

[2]  S. Demleitner [Communication without words]. , 1997, Pflege aktuell.

[3]  Zhihong Zeng,et al.  Bimodal HCI-related affect recognition , 2004, ICMI '04.

[4]  Chun Chen,et al.  Audio-visual based emotion recognition - a new approach , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[5]  Nicu Sebe,et al.  MULTIMODAL EMOTION RECOGNITION , 2005 .

[6]  Christine L. Lisetti,et al.  Modeling Multimodal Expression of User’s Affective Subjective Experience , 2002, User Modeling and User-Adapted Interaction.

[7]  L. Rothkrantz,et al.  Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[8]  Vladimir Pavlovic,et al.  Boosted learning in dynamic Bayesian networks for multimodal speaker detection , 2003, Proc. IEEE.

[9]  Nicu Sebe,et al.  Affective multimodal human-computer interaction , 2005, ACM Multimedia.

[10]  Nicu Sebe,et al.  Facial expression recognition from video sequences: temporal and static modeling , 2003, Comput. Vis. Image Underst..

[11]  Thomas S. Huang,et al.  Connected vibrations: a modal analysis approach for non-rigid motion tracking , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[12]  Thomas S. Huang,et al.  Emotional expressions in audiovisual human computer interaction , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).