Bimodal HCI-related affect recognition

Perhaps the most fundamental application of affective computing will be Human-Computer Interaction (HCI) in which the computer should have the ability to detect and track the user's affective states, and make corresponding feedback. The human multi-sensor affect system defines the expectation of multimodal affect analyzer. In this paper, we present our efforts toward audio-visual HCI-related affect recognition. With HCI applications in mind, we take into account some special affective states which indicate users' cognitive/motivational states. Facing the fact that a facial expression is influenced by both an affective state and speech content, we apply a smoothing method to extract the information of the affective state from facial features. In our fusion stage, a voting method is applied to combine audio and visual modalities so that the final affect recognition accuracy is greatly improved. We test our bimodal affect recognition approach on 38 subjects with 11 HCI-related affect states. The extensive experimental results show that the average person-dependent affect recognition accuracy is almost 90% for our bimodal fusion.

[1]  S. Demleitner [Communication without words]. , 1997, Pflege aktuell.

[2]  Dan Roth,et al.  Learning to Resolve Natural Language Ambiguities: A Unified Approach , 1998, AAAI/IAAI.

[3]  Tsutomu Miyasato,et al.  Multimodal human emotion/expression recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[4]  Thomas S. Huang,et al.  Connected vibrations: a modal analysis approach for non-rigid motion tracking , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[5]  John H. L. Hansen,et al.  Speech under stress conditions: overview of the effect on speech production and on system performance , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[6]  Lawrence S. Chen,et al.  Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction , 2000 .

[7]  Thomas S. Huang,et al.  Emotional expressions in audiovisual human computer interaction , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[8]  Yasunari Yoshitomi,et al.  Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face , 2000, Proceedings 9th IEEE International Workshop on Robot and Human Interactive Communication. IEEE RO-MAN 2000 (Cat. No.00TH8499).

[9]  L. C. De Silva,et al.  Bimodal emotion recognition , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[10]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[11]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[12]  L. Rothkrantz,et al.  Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[13]  Zhihong Zeng,et al.  Face localization via hierarchical CONDENSATION with Fisher boosting feature selection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[14]  Nicu Sebe,et al.  Authentic Facial Expression Analysis , 2004, FGR.