Audio-Visual Spontaneous Emotion Recognition

Automatic multimodal recognition of spontaneous emotional expressions is a largely unexplored and challenging problem. In this paper, we explore audio-visual emotion recognition in a realistic human conversation setting--the Adult Attachment Interview (AAI). Based on the assumption that facial expression and vocal expression are at the same coarse affective states, positive and negative emotion sequences are labeled according to Facial Action Coding System. Facial texture in visual channel and prosody in audio channel are integrated in the framework of Adaboost multi-stream hidden Markov model (AdaMHMM) in which the Adaboost learning scheme is used to build component HMM fusion. Our approach is evaluated in AAI spontaneous emotion recognition experiments.

[1]  Chun Chen,et al.  Audio-visual based emotion recognition - a new approach , 2004, CVPR 2004.

[2]  Gwen Littlewort,et al.  Recognizing facial expression: machine learning and application to spontaneous behavior , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Takeo Kanade,et al.  Recognizing Action Units for Facial Expression Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Zhihong Zeng,et al.  Audio-visual affect recognition through multi-stream fused HMM for HCI , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Zhihong Zeng,et al.  Multi-stream Confidence Analysis for Audio-Visual Affect Recognition , 2005, ACII.

[6]  Thomas S. Huang,et al.  Emotional expressions in audiovisual human computer interaction , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[7]  Tsutomu Miyasato,et al.  Multimodal human emotion/expression recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[8]  P. Ekman,et al.  Facial Expressions of Emotion , 1979 .

[9]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[10]  Thomas S. Huang,et al.  Capturing subtle facial motions in 3D face tracking , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Nicu Sebe,et al.  Affective multimodal human-computer interaction , 2005, ACM Multimedia.

[12]  Maureen McGranaghan A Human Face , 1998 .

[13]  Maja Pantic,et al.  Dynamics of facial expression: recognition of facial actions and their temporal segments from face profile image sequences , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  Zhigang Deng,et al.  Analysis of emotion recognition using facial expressions, speech and multimodal information , 2004, ICMI '04.

[15]  Elmar Nöth,et al.  How to find trouble in communication , 2003, Speech Commun..

[16]  Alex Pentland,et al.  Coding, Analysis, Interpretation, and Recognition of Facial Expressions , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  J. Russell,et al.  Facial and vocal expressions of emotion. , 2003, Annual review of psychology.

[18]  Alex Pentland,et al.  Human Computing and Machine Understanding of Human Behavior: A Survey , 2007, Artifical Intelligence for Human Computing.

[19]  Gerhard Rigoll,et al.  Bimodal fusion of emotional data in an automotive environment , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[20]  Yuxiao Hu,et al.  Training combination strategy of multi-stream fused hidden Markov model for audio-visual affect recognition , 2006, MM '06.

[21]  Klaus R. Scherer,et al.  Feelings and Emotions: Feelings Integrate the Central Representation of Appraisal-driven Response Organization in Emotion , 2004 .

[22]  Nicu Sebe,et al.  Authentic Facial Expression Analysis , 2004, FGR.

[23]  Lawrence S. Chen,et al.  Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction , 2000 .

[24]  Maja Pantic,et al.  Web-based database for facial expression analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[25]  Nicu Sebe,et al.  Multimodal approaches for emotion recognition: a survey , 2005, IS&T/SPIE Electronic Imaging.

[26]  C. Izard The face of emotion , 1971 .

[27]  N. Amir,et al.  Towards an automatic classification of emotions in speech , 1998, ICSLP.

[28]  J. Cohn,et al.  Mother–infant face-to-face interaction: The sequence of dyadic states at 3, 6, and 9 months. , 1987 .

[29]  Roddy Cowie,et al.  ASR for emotional speech: Clarifying the issues and enhancing performance , 2005, Neural Networks.

[30]  Hervé Bourlard,et al.  A mew ASR approach based on independent processing and recombination of partial frequency bands , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[31]  Jing Xiao,et al.  Automatic analysis and recognition of brow actions and head motion in spontaneous facial behavior , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[32]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[33]  Chalapathy Neti,et al.  Frame-dependent multi-stream reliability indicators for audio-visual speech recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[34]  Yuxiao Hu,et al.  Spontaneous Emotional Facial Expression Detection , 2006, J. Multim..

[35]  Alex Pentland,et al.  Human computing and machine understanding of human behavior: a survey , 2006, ICMI '06.

[36]  Jiucang Hao,et al.  Emotion recognition by speech signals , 2003, INTERSPEECH.

[37]  Michael J. Black,et al.  Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion , 1995, Proceedings of IEEE International Conference on Computer Vision.

[38]  Valery A. Petrushin,et al.  Emotion recognition in speech signal: experimental study, development, and application , 2000, INTERSPEECH.

[39]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[40]  Zhihong Zeng,et al.  Audio-visual affect recognition in activation-evaluation space , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[41]  Jeffrey F. Cohn,et al.  The Timing of Facial Motion in posed and Spontaneous Smiles , 2003, Int. J. Wavelets Multiresolution Inf. Process..

[42]  Yuxiao Hu,et al.  One-class classification for spontaneous facial expression analysis , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[43]  Alexandros Potamianos,et al.  Multi-band speech recognition in noisy environments , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[44]  Yuxiao Hu,et al.  Learning a locality preserving subspace for visual recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[45]  Timothy F. Cootes,et al.  A unified approach to coding and interpreting face images , 1995, Proceedings of IEEE International Conference on Computer Vision.

[46]  Maja Pantic,et al.  Gaze-X: adaptive affective multimodal interface for single-user office scenarios , 2006, ICMI '06.

[47]  Glenn I. Roisman,et al.  The emotional integration of childhood experience: physiological, facial expressive, and self-reported emotional response during the adult attachment interview. , 2004, Developmental psychology.

[48]  P. Ekman,et al.  Facial Expression in Affective Disorders , 2005 .

[49]  Agneta H. Fischer,et al.  Feelings and Emotions: The Amsterdam Symposium , 2004 .

[50]  Lori Lamel,et al.  Challenges in real-life emotion annotation and machine learning based detection , 2005, Neural Networks.

[51]  Ashish Kapoor,et al.  Multimodal affect recognition in learning environments , 2005, ACM Multimedia.

[52]  Thomas Polzin,et al.  Pronunciation Variations In Emotional Speech , 1998 .

[53]  Larry S. Davis,et al.  Human expression recognition from motion using a radial basis function network architecture , 1996, IEEE Trans. Neural Networks.

[54]  Yasunari Yoshitomi,et al.  Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face , 2000, Proceedings 9th IEEE International Workshop on Robot and Human Interactive Communication. IEEE RO-MAN 2000 (Cat. No.00TH8499).

[55]  P. Ekman,et al.  What the face reveals : basic and applied studies of spontaneous expression using the facial action coding system (FACS) , 2005 .

[56]  P. Ekman Emotion in the human face , 1982 .

[57]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[58]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[59]  Rosalind W. Picard Affective Computing , 1997 .

[60]  R. Gibson,et al.  What the Face Reveals , 2002 .

[61]  Loïc Kessous,et al.  Modeling naturalistic affective states via facial and vocal expressions recognition , 2006, ICMI '06.

[62]  Chalapathy Neti,et al.  Frame-dependent multi-stream reliability indicators for audio-visual speech recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[63]  John H. L. Hansen,et al.  Speech under stress conditions: overview of the effect on speech production and on system performance , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[64]  Maja Pantic,et al.  Spontaneous vs. posed facial behavior: automatic analysis of brow actions , 2006, ICMI '06.

[65]  Kornel Laskowski,et al.  Emotion recognition in spontaneous speech using GMMs , 2006, INTERSPEECH.

[66]  Ling Guan,et al.  Recognizing human emotion from audiovisual information , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[67]  J. G. Taylor,et al.  Emotion recognition in human-computer interaction , 2005, Neural Networks.

[68]  Diane J. Litman,et al.  Predicting Student Emotions in Computer-Human Tutoring Dialogues , 2004, ACL.

[69]  Roddy Cowie,et al.  Beyond emotion archetypes: Databases for emotion modelling using neural networks , 2005, Neural Networks.

[70]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[71]  Nicu Sebe,et al.  Facial expression recognition from video sequences: temporal and static modeling , 2003, Comput. Vis. Image Underst..

[72]  Thomas S. Huang,et al.  Explanation-based facial motion tracking using a piecewise Bezier volume deformation model , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[73]  P. Ekman,et al.  Strong evidence for universals in facial expressions: a reply to Russell's mistaken critique. , 1994, Psychological bulletin.

[74]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .

[75]  L. C. De Silva,et al.  Bimodal emotion recognition , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[76]  L. Rothkrantz,et al.  Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[77]  Zhihong Zeng,et al.  Audio-Visual Affect Recognition , 2007, IEEE Transactions on Multimedia.

[78]  Dae-Jong Lee,et al.  Emotion recognition from the facial image and speech signal , 2003, SICE 2003 Annual Conference (IEEE Cat. No.03TH8734).