Expression of affect in spontaneous speech: Acoustic correlates and automatic detection of irritation and resignation

The majority of previous studies on vocal expression have been conducted on posed expressions. In contrast, we utilized a large corpus of authentic affective speech recorded from real-life voice controlled telephone services. Listeners rated a selection of 200 utterances from this corpus with regard to level of perceived irritation, resignation, neutrality, and emotion intensity. The selected utterances came from 64 different speakers who each provided both neutral and affective stimuli. All utterances were further automatically analyzed regarding a comprehensive set of acoustic measures related to F0, intensity, formants, voice source, and temporal characteristics of speech. Results first showed that several significant acoustic differences were found between utterances classified as neutral and utterances classified as irritated or resigned using a within-persons design. Second, listeners' ratings on each scale were associated with several acoustic measures. In general the acoustic correlates of irritation, resignation, and emotion intensity were similar to previous findings obtained with posed expressions, though the effect sizes were smaller for the authentic expressions. Third, automatic classification (using LDA classifiers both with and without speaker adaptation) of irritation, resignation, and neutral performed at a level comparable to human performance, though human listeners and machines did not necessarily classify individual utterances similarly. Fourth, clearly perceived exemplars of irritation and resignation were rare in our corpus. These findings were discussed in relation to future research.

[1]  R. Cowie Perceiving emotion: towards a realistic understanding of the task , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[2]  C. Gendrot Role de la qualite de la voix dans la simulation des emotions: une etude perceptive et physiologique: une etude perceptive et physiologique , 2003 .

[3]  Roddy Cowie,et al.  Emotional speech: Towards a new generation of databases , 2003, Speech Commun..

[4]  P. Laukka,et al.  Impact of intended emotion intensity on cue utilization and decoding accuracy in vocal expression of emotion. , 2000, Emotion.

[5]  Petri Laukka,et al.  Research on vocal expression of emotion : State of the art and future directions , 2008 .

[6]  Albert Rilliard,et al.  De E-Wiz à C-Clone. Recueil, modélisation et synthèse d'expressions authentiques , 2006, Rev. d'Intelligence Artif..

[7]  K. Scherer,et al.  The effects of difficulty and gain versus loss on vocal physiology and acoustics. , 2007, Psychophysiology.

[8]  K. Morton,et al.  Expression in Speech: Analysis and Synthesis , 2003 .

[9]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Nick Campbell,et al.  Getting to the Heart of the Matter: Speech as the Expression of Affect; Rather than Just Text or Language , 2005, Lang. Resour. Evaluation.

[11]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[12]  P. Greasley,et al.  Emotion in Language and Speech: Methodological Issues in Naturalistic Approaches , 2000, Language and speech.

[13]  T. Paus,et al.  Affect-induced changes in speech production , 2002, Experimental Brain Research.

[14]  R. Frick,et al.  The prosodic expression of anger: Differentiating threat and frustration , 1986 .

[15]  Ross Buck,et al.  The communication of emotion , 1984 .

[16]  J. Hair Multivariate data analysis , 1972 .

[17]  Sylvie Hancil,et al.  The role of prosody in affective speech , 2009 .

[18]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[19]  Constantine Kotropoulos,et al.  Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition , 2008, Signal Process..

[20]  Kostas Karpouzis,et al.  The HUMAINE Database: Addressing the Collection and Annotation of Naturalistic and Induced Emotional Data , 2007, ACII.

[21]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[22]  S H ELDRED,et al.  A linguistic evaluation of feeling states in psychotherapy. , 1958, Psychiatry.

[23]  Roger K. Moore Computer Speech and Language , 1986 .

[24]  Petri Laukka,et al.  Categorical perception of vocal emotion expressions. , 2005, Emotion.

[25]  P. Ekman,et al.  The Repertoire of Nonverbal Behavior: Categories, Origins, Usage, and Coding , 1969 .

[26]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[27]  Björn W. Schuller,et al.  Timing levels in segment-based speech emotion recognition , 2006, INTERSPEECH.

[28]  A. Manstead,et al.  Handbook of social psychophysiology , 1989 .

[29]  Cynthia Breazeal,et al.  Recognition of Affective Communicative Intent in Robot-Directed Speech , 2002, Auton. Robots.

[30]  Marc Schröder,et al.  Emotional speech synthesis: a review , 2001, INTERSPEECH.

[31]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[32]  K. Scherer,et al.  Criteria for Emotion Recognition from Verbal and Nonverbal Expression: Studying Baggage Loss in the Airport , 2000 .

[33]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[34]  Valery A. Petrushin,et al.  EMOTION IN SPEECH: RECOGNITION AND APPLICATION TO CALL CENTERS , 1999 .

[35]  P. Laukka,et al.  Communication of emotions in vocal expression and music performance: different channels, same code? , 2003, Psychological bulletin.

[36]  Ailbhe Ní Chasaide,et al.  Time- and Amplitude-Based Voice Source Correlates of Emotional Portrayals , 2007, ACII.

[37]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[38]  Shrikanth S. Narayanan,et al.  Primitives-based evaluation and estimation of emotions in speech , 2007, Speech Commun..

[39]  Miriam Rose Bonner,et al.  Changes in the Speech Pattern under Emotional Tension , 1943 .

[40]  Emiel Krahmer,et al.  Problem detection in human-machine interactions based on facial expressions of users , 2005, Speech Commun..

[41]  Aurobinda Routray,et al.  Vocal emotion recognition in five native languages of Assam using new wavelet features , 2009, Int. J. Speech Technol..

[42]  I. Nonaka,et al.  Handbook of Organizational Learning and Knowledge , 2003 .

[43]  Elmar Nöth,et al.  Private emotions versus social interaction: a data-driven approach towards analysing emotion in speech , 2008, User Modeling and User-Adapted Interaction.

[44]  Radoslaw Niewiadomski,et al.  Multimodal Complex Emotions: Gesture Expressivity and Blended Facial Expressions , 2006, Int. J. Humanoid Robotics.

[45]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[46]  Laurence Devillers,et al.  Five emotion classes detection in real-world call center data : the use of various types of paralinguistic features , 2007 .

[47]  J. Bachorowski,et al.  Vocal Expression of Emotion: Acoustic Properties of Speech Are Associated With Emotional Intensity and Context , 1995 .

[48]  K. Scherer,et al.  Affective speech elicited with a computer game. , 2005, Emotion.

[49]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[50]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .

[51]  Roddy Cowie,et al.  Describing the emotional states that are expressed in speech , 2003, Speech Commun..

[52]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[53]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  Roddy Cowie,et al.  ASR for emotional speech: Clarifying the issues and enhancing performance , 2005, Neural Networks.

[55]  Kjell Elenius,et al.  Automatic recognition of anger in spontaneous speech , 2008, INTERSPEECH.

[56]  Carol Sherrard,et al.  Emotion in language and speech:methodological issues in the coding of natural data , 2000 .

[57]  Elmar Nöth,et al.  How to find trouble in communication , 2003, Speech Commun..

[58]  J. Russell,et al.  Facial and vocal expressions of emotion. , 2003, Annual review of psychology.

[59]  Bernd Kleinjohann,et al.  Fuzzy emotion recognition in natural speech dialogue , 2005, ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication, 2005..

[60]  Diane J. Litman,et al.  Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors , 2006, Speech Commun..

[61]  Lori Lamel,et al.  Challenges in real-life emotion annotation and machine learning based detection , 2005, Neural Networks.

[62]  Ruili Wang,et al.  Ensemble methods for spoken emotion recognition in call-centres , 2007, Speech Commun..

[63]  Werner Verhelst,et al.  An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech , 2007, Speech Commun..

[64]  P. Ekman An argument for basic emotions , 1992 .

[65]  Véronique Aubergé,et al.  Exploring the graded structure of vocal emotion expressions , 2009 .

[66]  P. Laukka,et al.  A dimensional approach to vocal expression of emotion , 2005 .

[67]  N. Frijda,et al.  The structure of subjective emotional intensity , 1994 .

[68]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[69]  K. Scherer Vocal correlates of emotional arousal and affective disturbance. , 1989 .

[70]  K. Scherer Vocal affect expression: a review and a model for future research. , 1986, Psychological bulletin.

[71]  John L. Arnott,et al.  Applying an analysis of acted vocal emotions to improve the simulation of synthetic speech , 2008, Comput. Speech Lang..

[72]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[73]  Elisabeth André,et al.  Comparing Feature Sets for Acted and Spontaneous Speech in View of Automatic Emotion Recognition , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[74]  M. Selting,et al.  Emphatic speech style mdash; with special focus on the prosodic signalling of heightened emotive involvement in conversation , 1994 .

[75]  Valérie Maffiolo,et al.  Analysis of emotional speech in voice mail messages: the influence of speakers' gender , 2004, INTERSPEECH.

[76]  Elmar Nöth,et al.  We are not amused - but how do you know? user states in a multi-modal dialogue system , 2003, INTERSPEECH.

[77]  Tapio Seppänen,et al.  Automatic Discrimination of Emotion from Spoken Finnish , 2004, Language and speech.

[78]  C. Darwin The Expression of the Emotions in Man and Animals , .

[79]  J. Brehm The Intensity of Emotion , 1999, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[80]  Rina Anne Kretsch Communication of emotional meanings across national groups , 1968 .

[81]  Kornel Laskowski,et al.  Emotion recognition in spontaneous speech using GMMs , 2006, INTERSPEECH.

[82]  W. Velicer,et al.  Comparison of five rules for determining the number of components to retain. , 1986 .

[83]  P. Laukka,et al.  Similar patterns of age-related differences in emotion recognition from speech and music , 2007 .

[84]  Malcolm Slaney,et al.  BabyEars: A recognition system for affective vocalizations , 2003, Speech Commun..

[85]  Winslow Burleson,et al.  Detecting anger in automated voice portal dialogs , 2006, INTERSPEECH.

[86]  K. Stevens,et al.  Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[87]  K. Scherer,et al.  Emotions in everyday life: probability of occurrence, risk factors, appraisal and reaction patterns , 2004 .

[88]  Kornel Laskowski,et al.  Combining Efforts for Improving Automatic Classification of Emotional User States , 2006 .

[89]  Pierre-Yves Oudeyer,et al.  The production and recognition of emotions in speech: features and algorithms , 2003, Int. J. Hum. Comput. Stud..

[90]  Marko Mäkipää,et al.  Organizational Learning and Knowledge Management in Contexts , 2004 .

[91]  Shrikanth S. Narayanan,et al.  The Vera am Mittag German audio-visual emotional speech database , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[92]  Jiahong Yuan,et al.  The acoustic realization of anger, fear, joy and sadness in Chinese , 2002, INTERSPEECH.

[93]  P. Alku,et al.  Physical variations related to stress and emotional state: A preliminary study. , 1996 .

[94]  Ryohei Nakatsu,et al.  Emotion Recognition in Speech Using Neural Networks , 2000, Neural Computing & Applications.

[95]  Björn W. Schuller,et al.  Audiovisual recognition of spontaneous interest within conversations , 2007, ICMI '07.

[96]  Mats Fredrikson,et al.  In a Nervous Voice: Acoustic Analysis and Perception of Anxiety in Social Phobics’ Speech , 2008 .

[97]  H M Hanson,et al.  Glottal characteristics of female speakers: acoustic correlates. , 1997, The Journal of the Acoustical Society of America.

[98]  T. D. Wilson,et al.  Review of: Jamblin, F.M. and Putnam, L.L. eds. The new handbook of organizational communication. Thousand Oaks, CA: , Sage, 2001 , 2005, Inf. Res..