论文信息 - A 3-D Audio-Visual Corpus of Affective Communication

A 3-D Audio-Visual Corpus of Affective Communication

Communication between humans deeply relies on the capability of expressing and recognizing feelings. For this reason, research on human-machine interaction needs to focus on the recognition and simulation of emotional states, prerequisite of which is the collection of affective corpora. Currently available datasets still represent a bottleneck for the difficulties arising during the acquisition and labeling of affective data. In this work, we present a new audio-visual corpus for possibly the two most important modalities used by humans to communicate their emotional states, namely speech and facial expression in the form of dense dynamic 3-D face geometries. We acquire high-quality data by working in a controlled environment and resort to video clips to induce affective states. The annotation of the speech signal includes: transcription of the corpus text into the phonological representation, accurate phone segmentation, fundamental frequency extraction, and signal intensity estimation of the speech signals. We employ a real-time 3-D scanner to acquire dense dynamic facial geometries and track the faces throughout the sequences, achieving full spatial and temporal correspondences. The corpus is a valuable tool for applications like affective visual speech synthesis or view-independent facial expression recognition.

[1] Kostas Karpouzis,et al. The HUMAINE Database: Addressing the Collection and Annotation of Naturalistic and Induced Emotional Data , 2007, ACII.

[2] Kostas Karpouzis,et al. Manual Annotation and Automatic Image Processing of Multimodal Emotional Behaviors in TV Interviews , 2006, AIAI.

[3] Harald Romsdorfer,et al. Polyglot text to speech synthesis: text analysis & prosody control , 2009 .

[4] Lijun Yin,et al. A high-resolution 3D dynamic facial expression database , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[5] Nicu Sebe,et al. Authentic facial expression analysis , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[6] Rosalind W. Picard. Toward computers that recognize and respond to user emotion , 2000, IBM Syst. J..

[7] Cynthia Whissell,et al. THE DICTIONARY OF AFFECT IN LANGUAGE , 1989 .

[8] K. Scherer,et al. Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[9] F. Hesse,et al. Relative effectiveness and validity of mood induction procedures : a meta-analysis , 1996 .

[10] Roddy Cowie,et al. Beyond emotion archetypes: Databases for emotion modelling using neural networks , 2005, Neural Networks.

[11] George N. Votsis,et al. Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[12] Klaus R. Scherer,et al. Using Actor Portrayals to Systematically Study Multimodal Emotion Expression: The GEMEP Corpus , 2007, ACII.

[13] Roddy Cowie,et al. Emotional speech: Towards a new generation of databases , 2003, Speech Commun..

[14] J. Gross,et al. Emotion elicitation using films , 1995 .

[15] E. Velten. A laboratory task for induction of mood states. , 1968, Behaviour research and therapy.

[16] Shrikanth S. Narayanan,et al. The Vera am Mittag German audio-visual emotional speech database , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[17] Zhihong Zeng,et al. A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] P. Ekman,et al. Facial action coding system: a technique for the measurement of facial movement , 1978 .

[19] P. Ekman,et al. Constants across cultures in the face and emotion. , 1971, Journal of personality and social psychology.

[20] S. Demleitner. [Communication without words]. , 1997, Pflege aktuell.

[21] Frédéric H. Pighin,et al. Expressive speech-driven facial animation , 2005, TOGS.

[22] M. Bradley,et al. Picture media and emotion: effects of a sustained affective context. , 1996, Psychophysiology.

[23] Gérard Bailly,et al. Generating prosodic attitudes in French: Data, model and evaluation , 2001, Speech Commun..

[24] Ulrich Trk. The technical processing in smartkom data collection: a case study , 2001, INTERSPEECH.

[25] Lawrence S. Chen,et al. Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction , 2000 .

[26] Luc Van Gool,et al. Fast 3D Scanning with Automatic Motion Compensation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27] Marc Schröder,et al. Expressive Speech Synthesis: Past, Present, and Possible Futures , 2009, Affective Information Processing.

[28] David M. Clark,et al. On the induction of depressed mood in the laboratory: Evaluation and comparison of the velten and musical procedures , 1983 .

[29] Roddy Cowie,et al. Describing the emotional states that are expressed in speech , 2003, Speech Commun..

[30] Li Zhang,et al. Dynamic, expressive speech animation from a single mesh , 2007, SCA '07.

[31] R. Craggs,et al. A two dimensional annotation scheme for emotion in dialogue , 2004 .

[32] Luc Van Gool,et al. Face/Off: live facial puppetry , 2009, SCA '09.

[33] Jean-Claude Martin,et al. Collection and Annotation of a Corpus of Human-Human Multimodal Interactions: Emotion and Others Anthropomorphic Characteristics , 2007, ACII.

[34] Hatice Gunes,et al. How to distinguish posed from spontaneous smiles using geometric features , 2007, ICMI '07.

[35] Carlos Busso,et al. IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.