Audio-visual affective expression recognition

Automatic affective expression recognition has attracted more and more attention of researchers from different disciplines, which will significantly contribute to a new paradigm for human computer interaction (affect-sensitive interfaces, socially intelligent environments) and advance the research in the affect-related fields including psychology, psychiatry, and education. Multimodal information integration is a process that enables human to assess affective states robustly and flexibly. In order to understand the richness and subtleness of human emotion behavior, the computer should be able to integrate information from multiple sensors. We introduce in this paper our efforts toward machine understanding of audio-visual affective behavior, based on both deliberate and spontaneous displays. Some promising methods are presented to integrate information from both audio and visual modalities. Our experiments show the advantage of audio-visual fusion in affective expression recognition over audio-only or visual-only approaches.

[1]  Stephen E. Levinson,et al.  A fused hidden Markov model with application to bimodal speech processing , 2004, IEEE Transactions on Signal Processing.

[2]  P. Ekman,et al.  What the face reveals : basic and applied studies of spontaneous expression using the facial action coding system (FACS) , 2005 .

[3]  K. Stevens,et al.  Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[4]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[5]  L. Rothkrantz,et al.  Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[6]  Glenn I. Roisman,et al.  The emotional integration of childhood experience: physiological, facial expressive, and self-reported emotional response during the adult attachment interview. , 2004, Developmental psychology.

[7]  Lawrence S. Chen,et al.  Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction , 2000 .

[8]  Yuxiao Hu,et al.  Training combination strategy of multi-stream fused hidden Markov model for audio-visual affect recognition , 2006, MM '06.

[9]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[10]  Zhihong Zeng,et al.  Audio-visual affect recognition in activation-evaluation space , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[11]  Zhihong Zeng,et al.  Audio-Visual Affect Recognition , 2007, IEEE Transactions on Multimedia.

[12]  J. Russell,et al.  Facial and vocal expressions of emotion. , 2003, Annual review of psychology.

[13]  Tsutomu Miyasato,et al.  Multimodal human emotion/expression recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[14]  P. Ekman,et al.  Facial Expressions of Emotion , 1979 .

[15]  Jeffrey F. Cohn,et al.  Foundations of human computing: facial expression and emotion , 2006, ICMI '06.

[16]  Zhihong Zeng,et al.  Audio-visual affect recognition through multi-stream fused HMM for HCI , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  K. Scherer,et al.  The New Handbook of Methods in Nonverbal Behavior Research , 2008 .

[18]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Yuxiao Hu,et al.  Audio-Visual Spontaneous Emotion Recognition , 2007, Artifical Intelligence for Human Computing.

[20]  Alex Pentland,et al.  Human computing and machine understanding of human behavior: a survey , 2006, ICMI '06.

[21]  K. Scherer,et al.  Handbook of affective sciences. , 2003 .

[22]  Thomas S. Huang,et al.  Explanation-based facial motion tracking using a piecewise Bezier volume deformation model , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[23]  John M. Gottman,et al.  Nonverbal communication coding systems of committed couples. , 2005 .