Induction, recording and recognition of natural emotions from facial expressions and speech prosody

Abstract Recording and annotating a multimodal database of natural expressivity is a task that requires careful planning and implementation, before even starting to apply feature extraction and recognition algorithms. Requirements and characteristics of such databases are inherently different than those of acted behaviour, both in terms of unconstrained expressivity of the human participants, and in terms of the expressed emotions. In this paper, we describe a method to induce, record and annotate natural emotions, which was used to provide multimodal data for dynamic emotion recognition from facial expressions and speech prosody; results from a dynamic recognition algorithm, based on recurrent neural networks, indicate that multimodal processing surpasses both speech and visual analysis by a wide margin. The SAL database was used in the framework of the Humaine Network of Excellence as a common ground for research in everyday, natural emotions.

[1]  Roddy Cowie,et al.  Real life emotions in French and English TV video clips: an integrated annotation protocol combining continuous and discrete approaches , 2006, LREC.

[2]  C. H. Chen,et al.  Handbook of Pattern Recognition and Computer Vision , 1993 .

[3]  Loïc Kessous,et al.  Emotion Recognition through Multiple Modalities: Face, Body Gesture, Speech , 2008, Affect and Emotion in Human-Computer Interaction.

[4]  Piet Mertens,et al.  The Prosogram: Semi-Automatic Transcription of Prosody Based on a Tonal Perception Model , 2004 .

[5]  Gayler, and David Hawking. Similarity-Aware Indexing for , 2009 .

[6]  K. Scherer Toward a dynamic theory of emotion : The component process model of affective states , 1987 .

[7]  Roddy Cowie,et al.  What a neural net needs to know about emotion words , 1999 .

[8]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[9]  Joseph Weizenbaum,et al.  and Machine , 1977 .

[10]  Roddy Cowie,et al.  Emotional speech: Towards a new generation of databases , 2003, Speech Commun..

[11]  Takeaki Uno,et al.  Towards Knowledge-Based Affective Interaction: Situational Interpretation of Affect , 2007, ACII.

[12]  Loïc Kessous,et al.  Modeling Naturalistic Affective States Via Facial, Vocal, and Bodily Expressions Recognition , 2007, Artifical Intelligence for Human Computing.

[13]  Jim X. Chen,et al.  Human Computer Intelligent Interaction Using Augmented Cognition and Emotional Intelligence , 2007, HCI.

[14]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[15]  Elmar Nöth,et al.  How to find trouble in communication , 2003, Speech Commun..

[16]  R. Cowie,et al.  What are people doing when they assign everyday emotion terms , 2005 .

[17]  M. Milanova,et al.  Recognition of Emotional states in Natural Human-Computer Interaction , 2008, 2008 IEEE International Symposium on Signal Processing and Information Technology.

[18]  K. Scherer,et al.  Psychophysiological responses to appraisal dimensions in a computer game , 2004 .

[19]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .

[20]  J. G. Taylor,et al.  Emotion recognition in human-computer interaction , 2005, Neural Networks.

[21]  Roddy Cowie,et al.  The challenges of dealing with distributed signs of emotion: Theory and empirical evidence , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[22]  Nicu Sebe,et al.  Affective multimodal human-computer interaction , 2005, ACM Multimedia.

[23]  Kostas Karpouzis,et al.  Robust Feature Detection for Facial Expression Recognition , 2007, EURASIP J. Image Video Process..

[24]  Hans-Georg Zimmermann,et al.  Recurrent Neural Networks Are Universal Approximators , 2006, ICANN.

[25]  P. Ekman Facial expression and emotion. , 1993, The American psychologist.

[26]  A. Damasio,et al.  Insensitivity to future consequences following damage to human prefrontal cortex , 1994, Cognition.

[27]  Roddy Cowie,et al.  Recognition of Emotional States in Natural Human-Computer Interaction , 2008 .

[28]  Maja Pantic,et al.  Spontaneous vs. posed facial behavior: automatic analysis of brow actions , 2006, ICMI '06.

[29]  Albert Rilliard,et al.  E-Wiz: a Trapper Protocol for Hunting the Expressive Speech Corpora in Lab , 2004, LREC.

[30]  Arne Jönsson,et al.  Wizard of Oz studies: why and how , 1993, IUI '93.

[31]  E. Velten A laboratory task for induction of mood states. , 1968, Behaviour research and therapy.

[32]  Kostas Karpouzis,et al.  Emotion recognition through facial expression analysis based on a neurofuzzy network , 2005, Neural Networks.

[33]  Christian D. Schunn,et al.  Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction , 2002, Proc. IEEE.

[34]  Hans-Georg Zimmermann,et al.  Recurrent Neural Networks are Universal approximators , 2007, Int. J. Neural Syst..

[35]  J. Bachorowski Vocal Expression and Perception of Emotion , 1999 .

[36]  Roddy Cowie,et al.  Describing the emotional states that are expressed in speech , 2003, Speech Commun..

[37]  Loïc Kessous,et al.  Multimodal user’s affective state analysis in naturalistic interaction , 2010, Journal on Multimodal User Interfaces.

[38]  Maja Pantic,et al.  Automatic Analysis of Facial Expressions: The State of the Art , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  N. Frijda THE EMOTIONS (STUDIES IN EMOTION AND SOCIAL INTERACTION) , 2011 .

[40]  Rosalind W. Picard Affective computing: (526112012-054) , 1997 .

[41]  Andrew Ortony,et al.  The Cognitive Structure of Emotions , 1988 .

[42]  Hatice Gunes,et al.  How to distinguish posed from spontaneous smiles using geometric features , 2007, ICMI '07.

[43]  J. Russell,et al.  Core affect, prototypical emotional episodes, and other things called emotion: dissecting the elephant. , 1999, Journal of personality and social psychology.

[44]  Shrikanth S. Narayanan,et al.  Recording audio-visual emotional databases from actors : a closer look , 2008 .

[45]  Björn W. Schuller,et al.  On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues , 2009, Journal on Multimodal User Interfaces.

[46]  Kostas Karpouzis,et al.  Dealing with feature uncertainty in facial expression recognition , 2006, Int. J. Intell. Syst. Technol. Appl..

[47]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[49]  Richard S. Lazarus,et al.  Transactional theory and research on emotions and coping , 1987 .

[50]  S. Kollias,et al.  Synthesizing Gesture Expressivity Based on Real Sequences , 2006 .

[51]  Ning Wang,et al.  Introducing EVG: An Emotion Evoking Game , 2006, IVA.

[52]  Kostas Karpouzis,et al.  Parameterized Facial Expression Synthesis Based on MPEG-4 , 2002, EURASIP J. Adv. Signal Process..

[53]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[54]  Cynthia Whissell,et al.  THE DICTIONARY OF AFFECT IN LANGUAGE , 1989 .

[55]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[56]  R. Lazarus Emotion and Adaptation , 1991 .

[57]  Roddy Cowie,et al.  Automatic recognition of emotion from voice: a rough benchmark , 2000 .

[58]  Kostas Karpouzis,et al.  User and context adaptive neural networks for emotion recognition , 2008, Neurocomputing.

[59]  Kostas Karpouzis,et al.  The HUMAINE Database: Addressing the Collection and Annotation of Naturalistic and Induced Emotional Data , 2007, ACII.

[60]  Joseph W. Young,et al.  Head and Face Anthropometry of Adult U.S. Civilians , 1993 .

[61]  Loïc Kessous,et al.  Modeling naturalistic affective states via facial and vocal expressions recognition , 2006, ICMI '06.

[62]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.