Crossmodal and incremental perception of audiovisual cues to emotional speech

In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: (1) how do visual cues from a speaker’s face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? Both experiments reported below are based on tests with video clips of emotional utterances collected via a variant of the well-known Velten method. More specifically, we recorded speakers who displayed positive or negative emotions, which were congruent or incongruent with the (emotional) lexical content of the uttered sentence. In order to test this, we conducted two experiments. The first experiment is a perception experiment in which Czech participants, who do not speak Dutch, rate the perceived emotional state of Dutch speakers in a bimodal (audiovisual) or a unimodal (audio- or vision-only) condition. It was found that incongruent emotional speech leads to significantly more extreme perceived emotion scores than congruent emotional speech, where the difference between congruent and incongruent emotional speech is larger for the negative than for the positive conditions. Interestingly, the largest overall differences between congruent and incongruent emotions were found for the audio-only condition, which suggests that posing an incongruent emotion has a particularly strong effect on the spoken realization of emotions.

[1]  P. Ekman,et al.  The Duchenne smile: emotional expression and brain physiology. II. , 1990, Journal of personality and social psychology.

[2]  Gary S. Katz,et al.  Bimodal expression of emotion by face and voice , 1998, MULTIMEDIA '98.

[3]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[4]  G. Pourtois,et al.  Distributed and interactive brain mechanisms during emotion face perception: Evidence from functional neuroimaging , 2007, Neuropsychologia.

[5]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[6]  Karen L. Schmidt,et al.  Human facial expressions as adaptations: Evolutionary questions in facial expression research. , 2001, American journal of physical anthropology.

[7]  E. Fox,et al.  Facial Expressions of Emotion: Are Angry Faces Detected More Efficiently? , 2000, Cognition & emotion.

[8]  J. Vroomen,et al.  The perception of emotions by ear and by eye , 2000 .

[9]  Emiel Krahmer,et al.  Mood, persuasion and information presentation , 2004 .

[10]  Leila T. Worth,et al.  Processing deficits and the mediation of positive affect in persuasion. , 1989, Journal of personality and social psychology.

[11]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[12]  Volker Strom,et al.  Visual prosody: facial movements accompanying speech , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[13]  I. Poggi,et al.  Multimodal markers of irony and sarcasm , 2003 .

[14]  F. Hesse,et al.  Relative effectiveness and validity of mood induction procedures : a meta-analysis , 1996 .

[15]  G. Calvert Crossmodal processing in the human brain: insights from functional neuroimaging studies. , 2001, Cerebral cortex.

[16]  T. Striano,et al.  Crossmodal integration of emotional information from face and voice in the infant brain. , 2006, Developmental science.

[17]  Glyn W. Humphreys,et al.  Expression is computed separately from facial identity, and it is computed separately for moving and static faces: Neuropsychological evidence , 1993, Neuropsychologia.

[18]  Véronique Aubergé,et al.  Can we hear the prosody of smile? , 2003, Speech Commun..

[19]  J. Hietanen,et al.  Positive facial expressions are recognized faster than negative facial expressions, but why? , 2004, Psychological research.

[20]  P. Ekman,et al.  DIFFERENCES Universals and Cultural Differences in the Judgments of Facial Expressions of Emotion , 2004 .

[21]  Hillary Anger Elfenbein,et al.  When familiarity breeds accuracy: cultural exposure and facial emotion recognition. , 2003, Journal of personality and social psychology.

[22]  Jyrki Tuomainen,et al.  The combined perception of emotion from voice and face: early interaction revealed by human electric brain responses , 1999, Neuroscience Letters.

[23]  Emiel Krahmer,et al.  Problem detection in human-machine interactions based on facial expressions of users , 2005, Speech Commun..

[24]  Y. Sugita,et al.  Auditory-visual speech perception examined by fMRI and PET , 2003, Neuroscience Research.

[25]  E. Velten A laboratory task for induction of mood states. , 1968, Behaviour research and therapy.

[26]  Mikko Sams,et al.  Does audiovisual speech perception use information about facial configuration? , 2001 .

[27]  Emiel Krahmer,et al.  The interplay between the auditory and visual modality for end-of-utterance detection. , 2008, The Journal of the Acoustical Society of America.

[28]  W. Rinn,et al.  Neuropsychology of facial expression. , 1991 .

[29]  Jeffrey F. Cohn,et al.  The Timing of Facial Motion in posed and Spontaneous Smiles , 2003, Int. J. Wavelets Multiresolution Inf. Process..

[30]  R. Adolphs Recognizing emotion from facial expressions: psychological and neurological mechanisms. , 2002, Behavioral and cognitive neuroscience reviews.

[31]  P. Ekman Facial expression and emotion. , 1993, The American psychologist.

[32]  P. Ekman,et al.  Emotion in the Human Face: Guidelines for Research and an Integration of Findings , 1972 .

[33]  Michael S. Gazzaniga,et al.  Hemispheric Mechanisms Controlling Voluntary and Spontaneous Facial Expressions , 1990, Journal of Cognitive Neuroscience.

[34]  P. Ekman Emotions Revealed: Recognizing Faces and Feelings to Improve Communication and Emotional Life , 2003 .

[35]  E. Krahmer,et al.  Mood, persuasion and information presentation , 2004 .

[36]  Amir Averbuch,et al.  Identification of Acoustic Signatures for Vehicles via Reduction of Dimensionality , 2004, Int. J. Wavelets Multiresolution Inf. Process..

[37]  Joost X. Maier,et al.  Multisensory Integration of Dynamic Faces and Voices in Rhesus Monkey Auditory Cortex , 2005 .

[38]  Michael J Brammer,et al.  Crossmodal identification , 1998, Trends in Cognitive Sciences.

[39]  Julia Hirschberg,et al.  Prosodic and other cues to speech recognition failures , 2004, Speech Commun..

[40]  J. M. Carroll,et al.  Do facial expressions signal specific emotions? Judging emotion from the face in context. , 1996, Journal of personality and social psychology.

[41]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[42]  J. Bachorowski Vocal Expression and Perception of Emotion , 1999 .

[43]  J. Russell,et al.  Facial and vocal expressions of emotion. , 2003, Annual review of psychology.

[44]  B. Rossion,et al.  The time‐course of intermodal binding between seeing and hearing affective information , 2000, Neuroreport.

[45]  Gernot Horstmann Facial expressions of emotion: does the prototype represent central tendency, frequency of instantiation, or an ideal? , 2002 .

[46]  Maja Pantic,et al.  Spontaneous vs. posed facial behavior: automatic analysis of brow actions , 2006, ICMI '06.

[47]  W. Rinn,et al.  The neuropsychology of facial expression: a review of the neurological and psychological mechanisms for producing facial expressions. , 1984, Psychological bulletin.