论文信息 - Audio-visual affect recognition through multi-stream fused HMM for HCI

Audio-visual affect recognition through multi-stream fused HMM for HCI

Advances in computer processing power and emerging algorithms are allowing new ways of envisioning human computer interaction. This paper focuses on the development of a computing algorithm that uses audio and visual sensors to detect and track a user's affective state to aid computer decision making. Using our multi-stream fused hidden Markov model (MFHMM), we analyzed coupled audio and visual streams to detect 11 cognitive/emotive states. The MFHMM allows the building of an optimal connection among multiple streams according to the maximum entropy principle and the maximum mutual information criterion. Person-independent experimental results from 20 subjects in 660 sequences show that the MFHMM approach performs with an accuracy of 80.61% which outperforms face-only HMM, pitch-only HMM, energy-only HMM, and independent HMM fusion.

[1] Zhihong Zeng,et al. Face localization via hierarchical CONDENSATION with Fisher boosting feature selection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[2] Thomas S. Huang,et al. Explanation-based facial motion tracking using a piecewise Bezier volume deformation model , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[3] Yasunari Yoshitomi,et al. Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face , 2000, Proceedings 9th IEEE International Workshop on Robot and Human Interactive Communication. IEEE RO-MAN 2000 (Cat. No.00TH8499).

[4] L. C. De Silva,et al. Bimodal emotion recognition , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[5] Stephen E. Levinson,et al. A fused hidden Markov model with application to bimodal speech processing , 2004, IEEE Transactions on Signal Processing.

[6] S. Demleitner. [Communication without words]. , 1997, Pflege aktuell.

[7] Alex Pentland,et al. Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8] Zhigang Deng,et al. Analysis of emotion recognition using facial expressions, speech and multimodal information , 2004, ICMI '04.

[9] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[10] Lawrence S. Chen,et al. Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction , 2000 .

[11] Oh-Wook Kwon,et al. EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[12] Zhihong Zeng,et al. Bimodal HCI-related affect recognition , 2004, ICMI '04.

[13] L. Rothkrantz,et al. Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[14] Tsutomu Miyasato,et al. Multimodal human emotion/expression recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[15] Thomas S. Huang,et al. Emotional expressions in audiovisual human computer interaction , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[16] Jiucang Hao,et al. Emotion recognition by speech signals , 2003, INTERSPEECH.

[17] Michael I. Jordan,et al. Mixed Memory Markov Models: Decomposing Complex Stochastic Processes as Mixtures of Simpler Ones , 1999, Machine Learning.

[18] Chun Chen,et al. Audio-visual based emotion recognition - a new approach , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[19] E. Vesterinen,et al. Affective Computing , 2009, Encyclopedia of Biometrics.