A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions

Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. However, the existing methods typically handle only deliberately displayed and exaggerated expressions of prototypical emotions despite the fact that deliberate behaviour differs in visual appearance, audio profile, and timing from spontaneously occurring behaviour. To address this problem, efforts to develop algorithms that can process naturally occurring human affective behaviour have recently emerged. Moreover, an increasing number of efforts are reported toward multimodal fusion for human affect analysis including audiovisual fusion, linguistic and paralinguistic fusion, and multi-cue visual fusion based on facial expressions, head movements, and body gestures. This paper introduces and surveys these recent advances. We first discuss human emotion perception from a psychological perspective. Next we examine available approaches to solving the problem of machine understanding of human affective behavior, and discuss important issues like the collection and availability of training and test data. We finally outline some of the scientific and engineering challenges to advancing human affect sensing technology.

[1]  Tsuhan Chen,et al.  The painful face - Pain expression recognition using active appearance models , 2009, Image Vis. Comput..

[2]  Qiang Ji,et al.  Active and dynamic information fusion for facial expression understanding from image sequences , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Elmar Nöth,et al.  “You Stupid Tin Box” - Children Interacting with the AIBO Robot: A Cross-linguistic Emotional Speech Corpus , 2004, LREC.

[4]  Zhihong Zeng,et al.  Audio-visual affect recognition through multi-stream fused HMM for HCI , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Zhiwei Zhu,et al.  Robust Real-Time Face Pose and Facial Expression Recovery , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Diane J. Litman,et al.  Predicting Emotion in Spoken Dialogue from Multiple Knowledge Sources , 2004, NAACL.

[7]  David Sander,et al.  A systems approach to appraisal mechanisms in emotion , 2005, Neural Networks.

[8]  Zhigang Deng,et al.  Analysis of emotion recognition using facial expressions, speech and multimodal information , 2004, ICMI '04.

[9]  David H. Evans,et al.  Detection of cough signals in continuous audio recordings using hidden Markov models , 2006, IEEE Transactions on Biomedical Engineering.

[10]  J. Russell,et al.  Evidence for a three-factor theory of emotions , 1977 .

[11]  Roddy Cowie,et al.  Emotional speech: Towards a new generation of databases , 2003, Speech Commun..

[12]  Qiang Ji,et al.  A probabilistic framework for modeling and real-time monitoring human fatigue , 2006, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[13]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[14]  Alice J. O'Toole,et al.  A video database of moving faces and people , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[16]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .

[17]  Fernando De la Torre,et al.  Facial Expression Analysis , 2011, Visual Analysis of Humans.

[18]  Nicu Sebe,et al.  Facial expression recognition from video sequences: temporal and static modeling , 2003, Comput. Vis. Image Underst..

[19]  Pierre-Yves Oudeyer,et al.  The production and recognition of emotions in speech: features and algorithms , 2003, Int. J. Hum. Comput. Stud..

[20]  Hatice Gunes,et al.  How to distinguish posed from spontaneous smiles using geometric features , 2007, ICMI '07.

[21]  Beat Fasel,et al.  Automati Fa ial Expression Analysis: A Survey , 1999 .

[22]  Maja Pantic,et al.  Audiovisual discrimination between laughter and speech , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Rana El Kaliouby,et al.  Self-Cam: feedback from what would be your social partner , 2006, SIGGRAPH '06.

[24]  Massimo Piccardi,et al.  Fusing Face and Body Display for Bi-modal Emotion Recognition: Single Frame Analysis and Multi-frame Post Integration , 2005, ACII.

[25]  David A. van Leeuwen,et al.  Automatic detection of laughter , 2005, INTERSPEECH.

[26]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[27]  Yuxiao Hu,et al.  Training combination strategy of multi-stream fused hidden Markov model for audio-visual affect recognition , 2006, MM '06.

[28]  Tsutomu Miyasato,et al.  Multimodal human emotion/expression recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[29]  Peter W. McOwan,et al.  A real-time automated system for the recognition of human facial expressions , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[30]  Jeffrey F. Cohn,et al.  Foundations of human computing: facial expression and emotion , 2006, ICMI '06.

[31]  K. Scherer,et al.  Vocal expression of affect , 2005 .

[32]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[33]  Qiang Ji,et al.  An automated face reader for fatigue detection , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[34]  Thomas S. Huang,et al.  Capturing subtle facial motions in 3D face tracking , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[35]  Nicu Sebe,et al.  Affective multimodal human-computer interaction , 2005, ACM Multimedia.

[36]  Ashish Kapoor,et al.  Multimodal affect recognition in learning environments , 2005, ACM Multimedia.

[37]  Laurence Devillers,et al.  Detection of real-life emotions in call centers , 2005, INTERSPEECH.

[38]  Hatice Gunes,et al.  A Bimodal Face and Body Gesture Database for Automatic Analysis of Human Nonverbal Affective Behavior , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[39]  Julia Hirschberg,et al.  Detecting certainness in spoken tutorial dialogues , 2005, INTERSPEECH.

[40]  Luiz Velho,et al.  Automatic 3D Facial Expression Analysis in Videos , 2005, AMFG.

[41]  B. Stein,et al.  The Merging of the Senses , 1993 .

[42]  Rogério Schmidt Feris,et al.  Manifold Based Analysis of Facial Expression , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[43]  Yuxiao Hu,et al.  Spontaneous Emotional Facial Expression Detection , 2006, J. Multim..

[44]  Alex Pentland,et al.  Human computing and machine understanding of human behavior: a survey , 2006, ICMI '06.

[45]  Maja Pantic,et al.  Dynamics of facial expression: recognition of facial actions and their temporal segments from face profile image sequences , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[46]  Elmar Nöth,et al.  AUTOMATIC CLASSIFICATION OF EMOTIONS AND INTER-LABELER CONSISTENCY , .

[47]  Björn W. Schuller,et al.  Audiovisual recognition of spontaneous interest within conversations , 2007, ICMI '07.

[48]  Takeo Kanade,et al.  Facial Expression Analysis , 2011, AMFG.

[49]  Jacob Whitehill,et al.  Haar features for FACS AU recognition , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[50]  D Sherwin,et al.  Faces of pain. , 1989, Advancing clinical care : official journal of NOAADN.

[51]  P. Ekman,et al.  What the face reveals : basic and applied studies of spontaneous expression using the facial action coding system (FACS) , 2005 .

[52]  Gwen Littlewort,et al.  Fully Automatic Facial Action Recognition in Spontaneous Behavior , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[53]  Gwen Littlewort,et al.  Faces of pain: automated measurement of spontaneousallfacial expressions of genuine and posed pain , 2007, ICMI '07.

[54]  Maja Pantic,et al.  Spontaneous vs. posed facial behavior: automatic analysis of brow actions , 2006, ICMI '06.

[55]  Maja Pantic,et al.  Gaze-X: adaptive affective multimodal interface for single-user office scenarios , 2006, ICMI '06.

[56]  Ashok Samal,et al.  Automatic recognition and analysis of human faces and facial expressions: a survey , 1992, Pattern Recognit..

[57]  Andreas Stolcke,et al.  Distinguishing deceptive from non-deceptive speech , 2005, INTERSPEECH.

[58]  Bernd Kleinjohann,et al.  Prosody based emotion recognition for MEXI , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[59]  Kenji Mase,et al.  Recognition of Facial Expression from Optical Flow , 1991 .

[60]  Yuxiao Hu,et al.  Audio-visual emotion recognition in adult attachment interview , 2006, ICMI '06.

[61]  David A. van Leeuwen,et al.  Automatic discrimination between laughter and speech , 2007, Speech Commun..

[62]  Ashish Kapoor,et al.  Automatic prediction of frustration , 2007, Int. J. Hum. Comput. Stud..

[63]  J. Cohn,et al.  Mother–infant face-to-face interaction: The sequence of dyadic states at 3, 6, and 9 months. , 1987 .

[64]  Changbo Hu,et al.  Probabilistic expression analysis on manifolds , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[65]  Glenn I. Roisman,et al.  The emotional integration of childhood experience: physiological, facial expressive, and self-reported emotional response during the adult attachment interview. , 2004, Developmental psychology.

[66]  Daniel Gatica-Perez,et al.  Latent semantic analysis of facial action codes for automatic facial expression recognition , 2004, MIR '04.

[67]  Maja Pantic,et al.  Automatic Analysis of Facial Expressions: The State of the Art , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[68]  P. Ekman Universals and cultural differences in facial expressions of emotion. , 1972 .

[69]  Thomas S. Huang,et al.  Explanation-based facial motion tracking using a piecewise Bezier volume deformation model , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[70]  Zhihong Zeng,et al.  Audio-Visual Affect Recognition , 2007, IEEE Transactions on Multimedia.

[71]  Laurence Devillers,et al.  Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs , 2006, INTERSPEECH.

[72]  Dae-Jong Lee,et al.  Emotion recognition from the facial image and speech signal , 2003, SICE 2003 Annual Conference (IEEE Cat. No.03TH8734).

[73]  Mohammed Yeasin,et al.  Recognition of facial expressions and measurement of levels of interest from video , 2006, IEEE Transactions on Multimedia.

[74]  Ling Guan,et al.  Recognizing human emotion from audiovisual information , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[75]  D. Watson,et al.  Development and validation of brief measures of positive and negative affect: the PANAS scales. , 1988, Journal of personality and social psychology.

[76]  Maja Pantic,et al.  Web-based database for facial expression analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[77]  Nicu Sebe,et al.  Multimodal Human Computer Interaction: A Survey , 2005, ICCV-HCI.

[78]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[79]  M. Bartlett,et al.  Machine Analysis of Facial Expressions , 2007 .

[80]  R. Plutchik Emotion, a psychoevolutionary synthesis , 1980 .

[81]  Roddy Cowie,et al.  ASR for emotional speech: Clarifying the issues and enhancing performance , 2005, Neural Networks.

[82]  Andreas Stolcke,et al.  Combining Prosodic Lexical and Cepstral Systems for Deceptive Speech Detection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[83]  Zhi-Hua Zhou,et al.  Enhancing relevance feedback in image retrieval using unlabeled data , 2006, ACM Trans. Inf. Syst..

[84]  P. Lang,et al.  Affective judgment and psychophysiological response: Dimensional covariation in the evaluation of pictorial stimuli. , 1989 .

[85]  K. Stevens,et al.  Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[86]  Jun Wang,et al.  3D Facial Expression Recognition Based on Primitive Surface Feature Distribution , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[87]  Peter Robinson,et al.  Real-Time Inference of Complex Mental States from Facial Expressions and Head Gestures , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[88]  Nicu Sebe,et al.  Emotion Recognition Based on Joint Visual and Audio Cues , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[89]  P. Ekman Emotion in the human face , 1982 .

[90]  Stephen E. Levinson,et al.  Children's emotion recognition in an intelligent tutoring scenario , 2004, INTERSPEECH.

[91]  J. G. Taylor,et al.  Emotion recognition in human-computer interaction , 2005, Neural Networks.

[92]  Diane J. Litman,et al.  Predicting Student Emotions in Computer-Human Tutoring Dialogues , 2004, ACL.

[93]  Roddy Cowie,et al.  Beyond emotion archetypes: Databases for emotion modelling using neural networks , 2005, Neural Networks.

[94]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[95]  Loïc Kessous,et al.  Modeling Naturalistic Affective States Via Facial, Vocal, and Bodily Expressions Recognition , 2007, Artifical Intelligence for Human Computing.

[96]  Jing Xiao,et al.  Robust full-motion recovery of head by dynamic templates and re-registration techniques , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[97]  Elmar Nöth,et al.  How to find trouble in communication , 2003, Speech Commun..

[98]  J. Russell,et al.  Facial and vocal expressions of emotion. , 2003, Annual review of psychology.

[99]  P. Ekman,et al.  Facial Expression in Affective Disorders , 2005 .

[100]  Fumio Hara,et al.  The recognition of basic facial expressions by neural network , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[101]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[102]  Hatice Gunes,et al.  Affect recognition from face and body: early fusion vs. late fusion , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[103]  Susanne Burger,et al.  The ISL meeting corpus: the impact of meeting type on speech style , 2002, INTERSPEECH.

[104]  Elmar Nöth,et al.  "Of all things the measure is man" automatic classification of emotions and inter-labeler consistency [speech-based emotion recognition] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[105]  P. Ekman,et al.  Strong evidence for universals in facial expressions: a reply to Russell's mistaken critique. , 1994, Psychological bulletin.

[106]  John J. B. Allen,et al.  The handbook of emotion elicitation and assessment , 2007 .

[107]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .

[108]  L. Rothkrantz,et al.  Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[109]  Nirbhay N. Singh,et al.  Facial Expressions of Emotion , 1998 .

[110]  D. Keltner Signs of appeasement: evidence for the distinct displays of embarrassment, amusement, and shame , 1995 .

[111]  Ahmed M. Elgammal,et al.  Facial Expression Analysis Using Nonlinear Decomposable Generative Models , 2005, AMFG.

[112]  Kornel Laskowski,et al.  Emotion recognition in spontaneous speech using GMMs , 2006, INTERSPEECH.

[113]  ZhouZhi-Hua,et al.  Enhancing relevance feedback in image retrieval using unlabeled data , 2006 .

[114]  Cynthia Whissell,et al.  THE DICTIONARY OF AFFECT IN LANGUAGE , 1989 .

[115]  Ioannis Pitas,et al.  Facial Expression Recognition in Image Sequences Using Geometric Deformation Features and Support Vector Machines , 2007, IEEE Transactions on Image Processing.

[116]  Claude-Nicolas Fiechter,et al.  Efficient reinforcement learning , 1994, COLT '94.

[117]  Christian D. Schunn,et al.  Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction , 2002, Proc. IEEE.

[118]  Björn Schuller,et al.  Affect-Robust Speech Recognition by Dynamic Emotional Adaptation , 2006 .

[119]  Maja Pantic,et al.  Case-based reasoning for user-profiled recognition of emotions from face images , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[120]  Valérie Maffiolo,et al.  A study on the automatic detection and characterization of emotion in a voice service context , 2005, INTERSPEECH.

[121]  Guodong Guo,et al.  Learning from examples in the small sample case: face expression recognition , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[122]  Yuxiao Hu,et al.  Audio-Visual Spontaneous Emotion Recognition , 2007, Artifical Intelligence for Human Computing.

[123]  Björn W. Schuller,et al.  Meta-classifiers in acoustic and linguistic feature fusion-based affect recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[124]  Jiucang Hao,et al.  Emotion recognition by speech signals , 2003, INTERSPEECH.

[125]  Simon Lucey,et al.  Investigating Spontaneous Facial Action Recognition through AAM Representations of the Face , 2007 .

[126]  Jeffrey F. Cohn,et al.  The Timing of Facial Motion in posed and Spontaneous Smiles , 2003, Int. J. Wavelets Multiresolution Inf. Process..

[127]  Kornel Laskowski,et al.  Annotation and Analysis of Emotionally Relevant Behavior in the ISL Meeting Corpus , 2006, LREC.

[128]  Kostas Karpouzis,et al.  Emotion recognition through facial expression analysis based on a neurofuzzy network , 2005, Neural Networks.

[129]  Lori Lamel,et al.  Challenges in real-life emotion annotation and machine learning based detection , 2005, Neural Networks.

[130]  Narendra Ahuja,et al.  Facial expression decomposition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[131]  N. Ambady,et al.  Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. , 1992 .

[132]  S. Mozziconacci Prosody and emotions , 2002, Proceedings of the International Conference on Speech Prosody.

[133]  Zhihong Zeng,et al.  Audio-visual affect recognition in activation-evaluation space , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[134]  Kostas Karpouzis,et al.  Emotion Analysis in Man-Machine Interaction Systems , 2004, MLMI.

[135]  Christine L. Lisetti,et al.  MAUI: a multimodal affective user interface , 2002, MULTIMEDIA '02.

[136]  Gwen Littlewort,et al.  Recognizing facial expression: machine learning and application to spontaneous behavior , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[137]  C. H. Chen,et al.  Handbook of Pattern Recognition and Computer Vision , 1993 .

[138]  Jun Wang,et al.  A 3D facial expression database for facial behavior research , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[139]  K. Scherer,et al.  The New Handbook of Methods in Nonverbal Behavior Research , 2008 .

[140]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[141]  Amanda C.C. Williams,et al.  Facial expression of pain: An evolutionary account , 2002, Behavioral and Brain Sciences.

[142]  R. Gross Face Databases , 2005 .

[143]  Qiang Ji,et al.  Facial Action Unit Recognition by Exploiting Their Dynamic and Semantic Relationships , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[144]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[145]  Rosalind W. Picard Affective Computing , 1997 .

[146]  R. Gibson,et al.  What the Face Reveals , 2002 .

[147]  Loïc Kessous,et al.  Modeling naturalistic affective states via facial and vocal expressions recognition , 2006, ICMI '06.

[148]  Gwen Littlewort,et al.  A Prototype for Automatic Recognition of Spontaneous Facial Actions , 2002, NIPS.

[149]  Shrikanth S. Narayanan,et al.  Emotion recognition using a data-driven fuzzy inference system , 2003, INTERSPEECH.

[150]  Nicu Sebe,et al.  Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[151]  Nicu Sebe,et al.  Authentic facial expression analysis , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[152]  Maja Pantic,et al.  Facial action recognition for facial expression analysis from static face images , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[153]  Chun Chen,et al.  Audio-visual based emotion recognition - a new approach , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[154]  Maja Pantic,et al.  Motion history for facial action detection in video , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[155]  Alex Pentland,et al.  Socially aware, computation and communication , 2005, Computer.

[156]  Laurence Devillers,et al.  Reliability of Lexical and Prosodic Cues in Two Real-life Spoken Dialog Corpora , 2004, LREC.

[157]  Ananth N. Iyer,et al.  Emotion Detection From Infant Facial Expressions And Cries , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[158]  Gerhard Rigoll,et al.  Bimodal fusion of emotional data in an automotive environment , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[159]  Lawrence S. Chen,et al.  Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction , 2000 .

[160]  K. Kroschel,et al.  Evaluation of natural emotions using self assessment manikins , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[161]  Jing Xiao,et al.  Automatic analysis and recognition of brow actions and head motion in spontaneous facial behavior , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).