Modeling Naturalistic Affective States Via Facial, Vocal, and Bodily Expressions Recognition

Affective and human-centered computing have attracted a lot of attention during the past years, mainly due to the abundance of devices and environments able to exploit multimodal input from the part of the users and adapt their functionality to their preferences or individual habits. In the quest to receive feedback from the users in an unobtrusive manner, the combination of facial and hand gestures with prosody information allows us to infer the users' emotional state, relying on the best performing modality in cases where one modality suffers from noise or bad sensing conditions. In this paper, we describe a multi-cue, dynamic approach to detect emotion in naturalistic video sequences. Contrary to strictly controlled recording conditions of audiovisual material, the proposed approach focuses on sequences taken from nearly real world situations. Recognition is performed via a 'Simple Recurrent Network' which lends itself well to modeling dynamic events in both user's facial expressions and speech. Moreover this approach differs from existing work in that it models user expressivity using a dimensional representation of activation and valence, instead of detecting discrete 'universal emotions', which are scarce in everyday human-machine interaction. The algorithm is deployed on an audiovisual database which was recorded simulating human-human discourse and, therefore, contains less extreme expressivity and subtle variations of a number of emotion labels.

[1]  K. Stevens,et al.  Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[2]  P. Ekman Pictures of Facial Affect , 1976 .

[3]  G. W. Williams,et al.  Comparing the joint agreement of several raters with another rater. , 1976, Biometrics.

[4]  K. Scarbrough,et al.  of Electrical Engineering , 1982 .

[5]  Joseph Weizenbaum,et al.  and Machine , 1977 .

[6]  Alan M. Wood,et al.  Motion analysis , 1986 .

[7]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[8]  C. Tomasi Detection and Tracking of Point Features , 1991 .

[9]  C Tomasi,et al.  Shape and motion from image streams: a factorization method. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Luc Vincent,et al.  Morphological grayscale reconstruction in image analysis: applications and efficient algorithms , 1993, IEEE Trans. Image Process..

[11]  Joseph W. Young,et al.  Head and Face Anthropometry of Adult U.S. Civilians , 1993 .

[12]  C. H. Chen,et al.  Handbook of Pattern Recognition and Computer Vision , 1993 .

[13]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[14]  Roddy Cowie,et al.  Automatic statistical analysis of the signal and prosodic signs of emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[15]  Klaus R. Scherer,et al.  Adding the affective dimension: a new look in speech analysis and synthesis , 1996, ICSLP.

[16]  Hong Yan,et al.  Locating and extracting the eye in human face images , 1996, Pattern Recognit..

[17]  Enric Monte-Moreno,et al.  Acoustic-phonetic decoding based on Elman predictive neural networks , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[18]  Rosalind W. Picard Affective Computing , 1997 .

[19]  S. Demleitner [Communication without words]. , 1997, Pflege aktuell.

[20]  J. Lien,et al.  Automatic recognition of facial expressions using hidden markov models and estimation of expression intensity , 1998 .

[21]  Maja Pantic,et al.  Automatic Analysis of Facial Expressions: The State of the Art , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Lawrence S. Chen,et al.  Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction , 2000 .

[23]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .

[24]  Thomas S. Huang,et al.  Emotion Recognition from Facial Expressions using Multilevel HMM , 2000 .

[25]  Thomas S. Huang,et al.  Emotional expressions in audiovisual human computer interaction , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[26]  Rosalind W. Picard Toward computers that recognize and respond to user emotion , 2000, IBM Syst. J..

[27]  L. C. De Silva,et al.  Bimodal emotion recognition , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[28]  A. Murat Tekalp,et al.  Face and 2-D mesh animation in MPEG-4 , 2000, Signal Process. Image Commun..

[29]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[30]  Jennifer Healey,et al.  Toward Machine Emotional Intelligence: Analysis of Affective Physiological State , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[32]  Erik Hjelmås,et al.  Face Detection: A Survey , 2001, Comput. Vis. Image Underst..

[33]  Maurizio Mancini,et al.  Formational parameters and adaptive prototype instantiation for MPEG-4 compliant gesture synthesis , 2002, Proceedings of Computer Animation 2002 (CA 2002).

[34]  Kostas Karpouzis,et al.  Parameterized Facial Expression Synthesis Based on MPEG-4 , 2002, EURASIP J. Adv. Signal Process..

[35]  Rabab K. Ward,et al.  2002 Index, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24 , 2002 .

[36]  Anil K. Jain,et al.  Face Detection in Color Images , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Narendra Ahuja,et al.  Detecting Faces in Images: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .

[39]  L. Rothkrantz,et al.  Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[40]  Say Wei Foo,et al.  Speech emotion recognition using hidden Markov models , 2003, Speech Commun..

[41]  Pierre-Yves Oudeyer,et al.  The production and recognition of emotions in speech: features and algorithms , 2003, Int. J. Hum. Comput. Stud..

[42]  Luc Van Gool,et al.  SVM-based nonparametric discriminant analysis, an application to face detection , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[43]  Nicu Sebe,et al.  Learning Bayesian network classifiers for facial expression recognition both labeled and unlabeled data , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[44]  Kostas Karpouzis,et al.  Facial Expression and Gesture Analysis for Emotionally-Rich Man-Machine Interaction , 2004 .

[45]  Piet Mertens,et al.  The Prosogram: Semi-Automatic Transcription of Prosody Based on a Tonal Perception Model , 2004 .

[46]  Kostas Karpouzis,et al.  Emotion Analysis in Man-Machine Interaction Systems , 2004, MLMI.

[47]  Nicu Sebe,et al.  Affective multimodal human-computer interaction , 2005, ACM Multimedia.

[48]  Zhihong Zeng,et al.  Multi-stream Confidence Analysis for Audio-Visual Affect Recognition , 2005, ACII.

[49]  Alex Pentland,et al.  Socially aware, computation and communication , 2005, Computer.

[50]  Limin Wang,et al.  Applications of PSO Algorithm and OIF Elman Neural Network to Assessment and Forecasting for Atmospheric Quality , 2005 .

[51]  Nicu Sebe,et al.  Semi-Supervised Face Detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[52]  Kostas Karpouzis,et al.  Emotion recognition through facial expression analysis based on a neurofuzzy network , 2005, Neural Networks.

[53]  Terrence J. Sejnowski,et al.  New Directions in Statistical Signal Processing: From Systems to Brains (Neural Information Processing) , 2006 .

[54]  Alejandro Jaimes Human-centered multimedia: culture, deployment, and access , 2006, IEEE Multimedia.

[55]  Yuxiao Hu,et al.  Audio-visual emotion recognition in adult attachment interview , 2006, ICMI '06.

[56]  S. Kollias,et al.  Synthesizing Gesture Expressivity Based on Real Sequences , 2006 .

[57]  Margherita Pagani Encyclopedia of Multimedia Technology and Networking , 2008 .

[58]  Andrei Popescu-Belis,et al.  Machine Learning for Multimodal Interaction , 4th International Workshop, MLMI 2007, Brno, Czech Republic, June 28-30, 2007, Revised Selected Papers , 2008, MLMI.

[59]  Maja Pantic,et al.  Face for Interface , 2009 .