Audio-Visual Affect Recognition

The ability of a computer to detect and appropriately respond to changes in a user's affective state has significant implications to human-computer interaction (HCI). In this paper, we present our efforts toward audio-visual affect recognition on 11 affective states customized for HCI application (four cognitive/motivational and seven basic affective states) of 20 nonactor subjects. A smoothing method is proposed to reduce the detrimental influence of speech on facial expression recognition. The feature selection analysis shows that subjects are prone to use brow movement in face, pitch and energy in prosody to express their affects while speaking. For person-dependent recognition, we apply the voting method to combine the frame-based classification results from both audio and visual channels. The result shows 7.5% improvement over the best unimodal performance. For person-independent test, we apply multistream HMM to combine the information from multiple component streams. This test shows 6.1% improvement over the best component performance

[1]  Thomas S. Huang,et al.  Facial Expression Recognition from Video Sequences : Temporal and Static Modelling , 2002 .

[2]  Jiucang Hao,et al.  Emotion recognition by speech signals , 2003, INTERSPEECH.

[3]  Elmar Nöth,et al.  Recognition of emotion in a realistic dialogue scenario , 2000, INTERSPEECH.

[4]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[5]  Ling Guan,et al.  Recognizing human emotion from audiovisual information , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  A. Rogier [Communication without words]. , 1971, Tijdschrift voor ziekenverpleging.

[7]  Tsutomu Miyasato,et al.  Multimodal human emotion/expression recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[8]  Thomas S. Huang,et al.  Explanation-based facial motion tracking using a piecewise Bezier volume deformation model , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[9]  Yasunari Yoshitomi,et al.  Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face , 2000, Proceedings 9th IEEE International Workshop on Robot and Human Interactive Communication. IEEE RO-MAN 2000 (Cat. No.00TH8499).

[10]  Gerhard Rigoll,et al.  Bimodal fusion of emotional data in an automotive environment , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11]  Lawrence S. Chen,et al.  Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction , 2000 .

[12]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .

[13]  L. C. De Silva,et al.  Bimodal emotion recognition , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[14]  Takeo Kanade,et al.  Recognizing Action Units for Facial Expression Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Thomas S. Huang,et al.  Emotional expressions in audiovisual human computer interaction , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[16]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[17]  L. Rothkrantz,et al.  Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[18]  Dan Roth,et al.  Learning to Resolve Natural Language Ambiguities: A Unified Approach , 1998, AAAI/IAAI.

[19]  Roddy Cowie,et al.  Describing the emotional states that are expressed in speech , 2003, Speech Commun..

[20]  Thomas S. Huang,et al.  Face localization via hierarchical CONDENSATION with Fisher boosting feature selection , 2004, CVPR 2004.

[21]  Alex Pentland,et al.  Facial expression recognition using a dynamic model and motion energy , 1995, Proceedings of IEEE International Conference on Computer Vision.

[22]  J. G. Taylor,et al.  Emotion recognition in human-computer interaction , 2005, Neural Networks.

[23]  Diane J. Litman,et al.  Predicting Student Emotions in Computer-Human Tutoring Dialogues , 2004, ACL.

[24]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..