A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions

Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. However, the existing methods typically handle only deliberately displayed and exaggerated expressions of prototypical emotions despite the fact that deliberate behaviour differs in visual appearance, audio profile, and timing from spontaneously occurring behaviour. To address this problem, efforts to develop algorithms that can process naturally occurring human affective behaviour have recently emerged. Moreover, an increasing number of efforts are reported toward multimodal fusion for human affect analysis including audiovisual fusion, linguistic and paralinguistic fusion, and multi-cue visual fusion based on facial expressions, head movements, and body gestures. This paper introduces and surveys these recent advances. We first discuss human emotion perception from a psychological perspective. Next we examine available approaches to solving the problem of machine understanding of human affective behavior, and discuss important issues like the collection and availability of training and test data. We finally outline some of the scientific and engineering challenges to advancing human affect sensing technology.

[1]  Christian D. Schunn,et al.  Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction , 2002, Proc. IEEE.

[2]  Zhi-Hua Zhou,et al.  Enhancing relevance feedback in image retrieval using unlabeled data , 2006, ACM Trans. Inf. Syst..

[3]  P. Lang,et al.  Affective judgment and psychophysiological response: Dimensional covariation in the evaluation of pictorial stimuli. , 1989 .

[4]  Björn Schuller,et al.  Affect-Robust Speech Recognition by Dynamic Emotional Adaptation , 2006 .

[5]  K. Stevens,et al.  Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[6]  Jun Wang,et al.  3D Facial Expression Recognition Based on Primitive Surface Feature Distribution , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Peter Robinson,et al.  Real-Time Inference of Complex Mental States from Facial Expressions and Head Gestures , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[8]  J. G. Taylor,et al.  Emotion recognition in human-computer interaction , 2005, Neural Networks.

[9]  Diane J. Litman,et al.  Predicting Student Emotions in Computer-Human Tutoring Dialogues , 2004, ACL.

[10]  Maja Pantic,et al.  Automatic Analysis of Facial Expressions: The State of the Art , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  P. Ekman Universals and cultural differences in facial expressions of emotion. , 1972 .

[12]  Bernd Kleinjohann,et al.  Prosody based emotion recognition for MEXI , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Kenji Mase,et al.  Recognition of Facial Expression from Optical Flow , 1991 .

[14]  David A. van Leeuwen,et al.  Automatic discrimination between laughter and speech , 2007, Speech Commun..

[15]  Maja Pantic,et al.  Case-based reasoning for user-profiled recognition of emotions from face images , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[16]  Zhihong Zeng,et al.  Audio-Visual Affect Recognition , 2007, IEEE Transactions on Multimedia.

[17]  Laurence Devillers,et al.  Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs , 2006, INTERSPEECH.

[18]  Dae-Jong Lee,et al.  Emotion recognition from the facial image and speech signal , 2003, SICE 2003 Annual Conference (IEEE Cat. No.03TH8734).

[19]  Tsutomu Miyasato,et al.  Multimodal human emotion/expression recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[20]  Peter W. McOwan,et al.  A real-time automated system for the recognition of human facial expressions , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[21]  P. Ekman,et al.  Facial Expressions of Emotion , 1979 .

[22]  Jeffrey F. Cohn,et al.  Foundations of human computing: facial expression and emotion , 2006, ICMI '06.

[23]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[24]  Thomas S. Huang,et al.  Capturing subtle facial motions in 3D face tracking , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[25]  Nicu Sebe,et al.  Affective multimodal human-computer interaction , 2005, ACM Multimedia.

[26]  Hatice Gunes,et al.  A Bimodal Face and Body Gesture Database for Automatic Analysis of Human Nonverbal Affective Behavior , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[27]  Mohammed Yeasin,et al.  Recognition of facial expressions and measurement of levels of interest from video , 2006, IEEE Transactions on Multimedia.

[28]  Maja Pantic,et al.  Gaze-X: adaptive affective multimodal interface for single-user office scenarios , 2006, ICMI '06.

[29]  Andreas Stolcke,et al.  Distinguishing deceptive from non-deceptive speech , 2005, INTERSPEECH.

[30]  Roddy Cowie,et al.  Beyond emotion archetypes: Databases for emotion modelling using neural networks , 2005, Neural Networks.

[31]  Rogério Schmidt Feris,et al.  Manifold Based Analysis of Facial Expression , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[32]  P. Ekman,et al.  What the face reveals : basic and applied studies of spontaneous expression using the facial action coding system (FACS) , 2005 .

[33]  Shrikanth S. Narayanan,et al.  Emotion recognition using a data-driven fuzzy inference system , 2003, INTERSPEECH.

[34]  Alex Pentland Socially Aware Computation and Communication , 2005, Computer.

[35]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[36]  Qiang Ji,et al.  Facial Action Unit Recognition by Exploiting Their Dynamic and Semantic Relationships , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  P. Ekman,et al.  Facial Expression in Affective Disorders , 2005 .

[38]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[39]  Rosalind W. Picard Affective Computing , 1997 .

[40]  Lori Lamel,et al.  Challenges in real-life emotion annotation and machine learning based detection , 2005, Neural Networks.

[41]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[42]  Tsuhan Chen,et al.  The painful face - Pain expression recognition using active appearance models , 2009, Image Vis. Comput..

[43]  Narendra Ahuja,et al.  Facial expression decomposition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[44]  Diane J. Litman,et al.  Predicting Emotion in Spoken Dialogue from Multiple Knowledge Sources , 2004, NAACL.

[45]  Zhihong Zeng,et al.  Audio-visual affect recognition through multi-stream fused HMM for HCI , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[46]  Zhiwei Zhu,et al.  Robust Real-Time Face Pose and Facial Expression Recovery , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[47]  David Sander,et al.  A systems approach to appraisal mechanisms in emotion , 2005, Neural Networks.

[48]  Nicu Sebe,et al.  Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Maja Pantic,et al.  Facial action recognition for facial expression analysis from static face images , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[50]  Maja Pantic,et al.  Motion history for facial action detection in video , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[51]  David H. Evans,et al.  Detection of cough signals in continuous audio recordings using hidden Markov models , 2006, IEEE Transactions on Biomedical Engineering.

[52]  M. Turk,et al.  Probabilistic expression analysis on manifolds , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[53]  Laurence Devillers,et al.  Reliability of Lexical and Prosodic Cues in Two Real-life Spoken Dialog Corpora , 2004, LREC.

[54]  Loïc Kessous,et al.  Modeling naturalistic affective states via facial and vocal expressions recognition , 2006, ICMI '06.

[55]  Stephen E. Levinson,et al.  Children's emotion recognition in an intelligent tutoring scenario , 2004, INTERSPEECH.

[56]  Gwen Littlewort,et al.  A Prototype for Automatic Recognition of Spontaneous Facial Actions , 2002, NIPS.

[57]  Jun Wang,et al.  A 3D facial expression database for facial behavior research , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[58]  Amanda C.C. Williams,et al.  Facial expression of pain: An evolutionary account , 2002, Behavioral and Brain Sciences.

[59]  Zhigang Deng,et al.  Analysis of emotion recognition using facial expressions, speech and multimodal information , 2004, ICMI '04.

[60]  Hatice Gunes,et al.  How to distinguish posed from spontaneous smiles using geometric features , 2007, ICMI '07.

[61]  Rana El Kaliouby,et al.  Self-Cam: feedback from what would be your social partner , 2006, SIGGRAPH '06.

[62]  Massimo Piccardi,et al.  Fusing Face and Body Display for Bi-modal Emotion Recognition: Single Frame Analysis and Multi-frame Post Integration , 2005, ACII.

[63]  Nicu Sebe,et al.  Facial expression recognition from video sequences: temporal and static modeling , 2003, Comput. Vis. Image Underst..

[64]  Ashish Kapoor,et al.  Automatic prediction of frustration , 2007, Int. J. Hum. Comput. Stud..

[65]  Maja Pantic,et al.  Dynamics of facial expression: recognition of facial actions and their temporal segments from face profile image sequences , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[66]  Ahmed M. Elgammal,et al.  Facial Expression Analysis Using Nonlinear Decomposable Generative Models , 2005, AMFG.

[67]  Qiang Ji,et al.  Active and dynamic information fusion for facial expression understanding from image sequences , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  J. Russell,et al.  Evidence for a three-factor theory of emotions , 1977 .

[69]  Jacob Whitehill,et al.  Haar features for FACS AU recognition , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[70]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[71]  Kornel Laskowski,et al.  Emotion recognition in spontaneous speech using GMMs , 2006, INTERSPEECH.

[72]  Gwen Littlewort,et al.  Faces of pain: automated measurement of spontaneousallfacial expressions of genuine and posed pain , 2007, ICMI '07.

[73]  Cynthia Whissell,et al.  THE DICTIONARY OF AFFECT IN LANGUAGE , 1989 .

[74]  Hatice Gunes,et al.  Affect recognition from face and body: early fusion vs. late fusion , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[75]  Glenn I. Roisman,et al.  The emotional integration of childhood experience: physiological, facial expressive, and self-reported emotional response during the adult attachment interview. , 2004, Developmental psychology.

[76]  Daniel Gatica-Perez,et al.  Latent semantic analysis of facial action codes for automatic facial expression recognition , 2004, MIR '04.

[77]  Qiang Ji,et al.  A probabilistic framework for modeling and real-time monitoring human fatigue , 2006, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[78]  L. Rothkrantz,et al.  Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[79]  Ioannis Pitas,et al.  Facial Expression Recognition in Image Sequences Using Geometric Deformation Features and Support Vector Machines , 2007, IEEE Transactions on Image Processing.

[80]  Claude-Nicolas Fiechter,et al.  Efficient reinforcement learning , 1994, COLT '94.

[81]  Ananth N. Iyer,et al.  Emotion Detection From Infant Facial Expressions And Cries , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[82]  Gerhard Rigoll,et al.  Bimodal fusion of emotional data in an automotive environment , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[83]  Nicu Sebe,et al.  Authentic Facial Expression Analysis , 2004, FGR.

[84]  Lawrence S. Chen,et al.  Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction , 2000 .

[85]  Jing Xiao,et al.  Automatic analysis and recognition of brow actions and head motion in spontaneous facial behavior , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[86]  Loïc Kessous,et al.  Modeling Naturalistic Affective States Via Facial, Vocal, and Bodily Expressions Recognition , 2007, Artifical Intelligence for Human Computing.

[87]  Elmar Nöth,et al.  How to find trouble in communication , 2003, Speech Commun..

[88]  J. Russell,et al.  Facial and vocal expressions of emotion. , 2003, Annual review of psychology.

[89]  Yuxiao Hu,et al.  Spontaneous Emotional Facial Expression Detection , 2006, J. Multim..

[90]  Alex Pentland,et al.  Human computing and machine understanding of human behavior: a survey , 2006, ICMI '06.

[91]  Gwen Littlewort,et al.  Fully Automatic Facial Action Recognition in Spontaneous Behavior , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[92]  Nicu Sebe,et al.  Emotion Recognition Based on Joint Visual and Audio Cues , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[93]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[94]  Beat Fasel,et al.  Automati Fa ial Expression Analysis: A Survey , 1999 .

[95]  Maja Pantic,et al.  Audiovisual discrimination between laughter and speech , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[96]  J. Cohn,et al.  Mother–infant face-to-face interaction: The sequence of dyadic states at 3, 6, and 9 months. , 1987 .

[97]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[98]  Alice J. O'Toole,et al.  A video database of moving faces and people , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[99]  Simon Lucey,et al.  Investigating Spontaneous Facial Action Recognition through AAM Representations of the Face , 2007 .

[100]  Nicu Sebe,et al.  Multimodal Human Computer Interaction: A Survey , 2005, ICCV-HCI.

[101]  Jeffrey F. Cohn,et al.  The Timing of Facial Motion in posed and Spontaneous Smiles , 2003, Int. J. Wavelets Multiresolution Inf. Process..

[102]  Kornel Laskowski,et al.  Annotation and Analysis of Emotionally Relevant Behavior in the ISL Meeting Corpus , 2006, LREC.

[103]  Kostas Karpouzis,et al.  Emotion recognition through facial expression analysis based on a neurofuzzy network , 2005, Neural Networks.

[104]  D. Keltner Signs of appeasement: evidence for the distinct displays of embarrassment, amusement, and shame , 1995 .

[105]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[106]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .

[107]  Zhihong Zeng,et al.  Bimodal HCI-related affect recognition , 2004, ICMI '04.

[108]  K. Scherer,et al.  Vocal expression of affect , 2005 .

[109]  Ashish Kapoor,et al.  Multimodal affect recognition in learning environments , 2005, ACM Multimedia.

[110]  Julia Hirschberg,et al.  Detecting certainness in spoken tutorial dialogues , 2005, INTERSPEECH.

[111]  Thomas S. Huang,et al.  Explanation-based facial motion tracking using a piecewise Bezier volume deformation model , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[112]  PanticMaja,et al.  A Survey of Affect Recognition Methods , 2009 .

[113]  Susanne Burger,et al.  The ISL meeting corpus: the impact of meeting type on speech style , 2002, INTERSPEECH.

[114]  Elmar Nöth,et al.  "Of all things the measure is man" automatic classification of emotions and inter-labeler consistency [speech-based emotion recognition] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[115]  P. Ekman,et al.  Strong evidence for universals in facial expressions: a reply to Russell's mistaken critique. , 1994, Psychological bulletin.

[116]  John J. B. Allen,et al.  The handbook of emotion elicitation and assessment , 2007 .

[117]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .

[118]  Valérie Maffiolo,et al.  A study on the automatic detection and characterization of emotion in a voice service context , 2005, INTERSPEECH.

[119]  Guodong Guo,et al.  Learning from examples in the small sample case: face expression recognition , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[120]  Yuxiao Hu,et al.  Audio-Visual Spontaneous Emotion Recognition , 2007, Artifical Intelligence for Human Computing.

[121]  Björn W. Schuller,et al.  Meta-classifiers in acoustic and linguistic feature fusion-based affect recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[122]  Fumio Hara,et al.  The recognition of basic facial expressions by neural network , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[123]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[124]  N. Ambady,et al.  Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. , 1992 .

[125]  Jing Xiao,et al.  Robust full‐motion recovery of head by dynamic templates and re‐registration techniques , 2003 .

[126]  Zhihong Zeng,et al.  Audio-visual affect recognition in activation-evaluation space , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[127]  Ling Guan,et al.  Recognizing human emotion from audiovisual information , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[128]  D. Watson,et al.  Development and validation of brief measures of positive and negative affect: the PANAS scales. , 1988, Journal of personality and social psychology.

[129]  Björn W. Schuller,et al.  Audiovisual recognition of spontaneous interest within conversations , 2007, ICMI '07.

[130]  Takeo Kanade,et al.  Facial Expression Analysis , 2011, AMFG.

[131]  D Sherwin,et al.  Faces of pain. , 1989, Advancing clinical care : official journal of NOAADN.

[132]  Roddy Cowie,et al.  Emotional speech: Towards a new generation of databases , 2003, Speech Commun..

[133]  Maja Pantic,et al.  Spontaneous vs. posed facial behavior: automatic analysis of brow actions , 2006, ICMI '06.

[134]  Ashok Samal,et al.  Automatic recognition and analysis of human faces and facial expressions: a survey , 1992, Pattern Recognit..

[135]  Maja Pantic,et al.  Web-based database for facial expression analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[136]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[137]  M. Bartlett,et al.  Machine Analysis of Facial Expressions , 2007 .

[138]  Nicu Sebe,et al.  MULTIMODAL EMOTION RECOGNITION , 2005 .

[139]  Roddy Cowie,et al.  ASR for emotional speech: Clarifying the issues and enhancing performance , 2005, Neural Networks.

[140]  Andreas Stolcke,et al.  Combining Prosodic Lexical and Cepstral Systems for Deceptive Speech Detection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[141]  Chun Chen,et al.  Audio-visual based emotion recognition - a new approach , 2004, CVPR 2004.

[142]  Kostas Karpouzis,et al.  Emotion Analysis in Man-Machine Interaction Systems , 2004, MLMI.

[143]  Christine L. Lisetti,et al.  MAUI: a multimodal affective user interface , 2002, MULTIMEDIA '02.

[144]  Gwen Littlewort,et al.  Recognizing facial expression: machine learning and application to spontaneous behavior , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[145]  Yuxiao Hu,et al.  Training combination strategy of multi-stream fused hidden Markov model for audio-visual affect recognition , 2006, MM '06.

[146]  Qiang Ji,et al.  An automated face reader for fatigue detection , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[147]  Laurence Devillers,et al.  Detection of real-life emotions in call centers , 2005, INTERSPEECH.

[148]  Luiz Velho,et al.  Automatic 3D Facial Expression Analysis in Videos , 2005, AMFG.