Analysis of Emotional Speech - A Review

Speech carries information not only about the lexical content, but also about the age, gender, signature and emotional state of the speaker. Speech in different emotional states is accompanied by distinct changes in the production mechanism. In this chapter, we present a review of analysis methods used for emotional speech. In particular, we focus on the issues in data collection, feature representations and development of automatic emotion recognition systems. The significance of the excitation source component of speech production in emotional states is examined in detail. The derived excitation source features are shown to carry the emotion correlates.

[1]  Bayya Yegnanarayana,et al.  A Flexible Analysis Synthesis Tool (FAST) for studying the characteristic features of emotion in speech , 2012, 2012 IEEE Consumer Communications and Networking Conference (CCNC).

[2]  B. Yegnanarayana,et al.  Perceived loudness of speech based on the characteristics of glottal excitation source. , 2009, The Journal of the Acoustical Society of America.

[3]  Bin Yang,et al.  The Relevance of Voice Quality Features in Speaker Independent Emotion Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[4]  Ingo R. Titze,et al.  Principles of voice production , 1994 .

[5]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech: a review , 2012, International Journal of Speech Technology.

[6]  P. Alku,et al.  A comparison of glottal voice source quantification parameters in breathy, normal and pressed phonation of female and male speakers. , 1996, Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics.

[7]  Carlos Busso,et al.  Emotion recognition using a hierarchical binary decision tree approach , 2011, Speech Commun..

[8]  Björn W. Schuller,et al.  Recognizing Affect from Linguistic Information in 3D Continuous Space , 2011, IEEE Transactions on Affective Computing.

[9]  Levent M. Arslan,et al.  Automatic Detection of Anger in Human-Human Call Center Dialogs , 2011, INTERSPEECH.

[10]  Ailbhe Ní Chasaide,et al.  The role of voice quality in communicating emotion, mood and attitude , 2003, Speech Commun..

[11]  Luis Villaseñor Pineda,et al.  Features selection for primitives estimation on emotional speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  J. Bachorowski Vocal Expression and Perception of Emotion , 1999 .

[13]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[14]  Zhigang Deng,et al.  Emotion recognition based on phoneme classes , 2004, INTERSPEECH.

[15]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[16]  Eva Björkner,et al.  Interdependencies among Voice Source Parameters in Emotional Speech , 2011, IEEE Transactions on Affective Computing.

[17]  Elliot Moore,et al.  Investigating glottal parameters for differentiating emotional categories with similar prosodics , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Bayya Yegnanarayana,et al.  Discriminating Neutral and Emotional Speech using Neural Networks , 2014, ICON.

[19]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[20]  Björn W. Schuller,et al.  Timing levels in segment-based speech emotion recognition , 2006, INTERSPEECH.

[21]  Björn W. Schuller,et al.  On the Impact of Children's Emotional Speech on Acoustic and Language Models , 2010, EURASIP J. Audio Speech Music. Process..

[22]  Rui Xia,et al.  Sentence level emotion recognition based on decisions from subsentence segments , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Constantine Kotropoulos,et al.  Emotional speech classification using Gaussian mixture models , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[24]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[25]  Shrikanth S. Narayanan,et al.  The Vera am Mittag German audio-visual emotional speech database , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[26]  Elmar Nöth,et al.  Recognition of emotion in a realistic dialogue scenario , 2000, INTERSPEECH.

[27]  Tim Polzehl,et al.  Emotion classification in children's speech using fusion of acoustic and linguistic features , 2009, INTERSPEECH.

[28]  G. Fairbanks,et al.  An experimental study of the pitch characteristics of the voice during the expression of emotion , 1939 .

[29]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[30]  Carlos Busso,et al.  Energy and F0 contour modeling with functional data analysis for emotional speech detection , 2013, INTERSPEECH.

[31]  Bayya Yegnanarayana,et al.  Analysis of emotional speech at subsegmental level , 2013, INTERSPEECH.

[32]  R. van Bezooijen,et al.  Recognition of Vocal Expressions of Emotion , 1983 .

[33]  Rohit Kumar,et al.  Emotion Recognition using Acoustic and Lexical Features , 2012, INTERSPEECH.

[34]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[35]  C. Gobl Voice source dynamics in connected speech , 1988 .

[36]  Gilles Degottex,et al.  Usual voice quality features and glottal features for emotional valence detection , 2012 .

[37]  Laurence Devillers,et al.  Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs , 2006, INTERSPEECH.

[38]  Björn W. Schuller,et al.  Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[39]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[40]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[41]  Bayya Yegnanarayana,et al.  Epoch Extraction From Speech Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[42]  Keith Oatley,et al.  Basic emotions: Theory and measurement , 1992 .

[43]  Björn Schuller,et al.  The Automatic Recognition of Emotions in Speech , 2011 .

[44]  Jenefer Robinson The Importance of Being Emotional , 2005 .

[45]  Tai-Shih Chi,et al.  Spectro-temporal modulations for robust speech emotion recognition , 2010, INTERSPEECH.

[46]  Roddy Cowie,et al.  Acoustic correlates of emotion dimensions in view of speech synthesis , 2001, INTERSPEECH.

[47]  Shrikanth S. Narayanan,et al.  Primitives-based evaluation and estimation of emotions in speech , 2007, Speech Commun..

[48]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[49]  PAAVO ALKU,et al.  Glottal inverse filtering analysis of human voice production — A review of estimation and parameterization methods of the glottal excitation and their applications , 2011 .

[50]  Margaret McRorie,et al.  The Belfast Induced Natural Emotion Database , 2012, IEEE Transactions on Affective Computing.

[51]  Noam Amir,et al.  Classifying emotions in speech: a comparison of methods , 2001, INTERSPEECH.

[52]  Bayya Yegnanarayana,et al.  Spectro-temporal analysis of speech signals using zero-time windowing and group delay function , 2013, Speech Commun..

[53]  Bayya Yegnanarayana,et al.  Excitation source features for discrimination of anger and happy emotions , 2014, INTERSPEECH.

[54]  Christer Gobl,et al.  Acoustic characteristics of voice quality , 1992, Speech Commun..

[55]  H M Hanson,et al.  Glottal characteristics of female speakers: acoustic correlates. , 1997, The Journal of the Acoustical Society of America.

[56]  PanticMaja,et al.  A Survey of Affect Recognition Methods , 2009 .

[57]  Bayya Yegnanarayana,et al.  Event-Based Instantaneous Fundamental Frequency Estimation From Speech Signals , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[58]  Eric Keller,et al.  The Analysis of Voice Quality in Speech Processing , 2004, Summer School on Neural Networks.

[59]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[60]  Dietmar F. Rösner,et al.  Inducing Genuine Emotions in Simulated Speech-Based Human-Machine Interaction: The NIMITEK Corpus , 2010, IEEE Transactions on Affective Computing.

[61]  John H. L. Hansen,et al.  Angry emotion detection from real-life conversational speech by leveraging content structure , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[62]  V. K. Mittal,et al.  Effect of glottal dynamics in the production of shouted speech. , 2013, The Journal of the Acoustical Society of America.

[63]  Carlos Busso,et al.  Shape-based modeling of the fundamental frequency contour for emotion detection in speech , 2014, Comput. Speech Lang..

[64]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[65]  H. Scholsberg A scale for the judgment of facial expressions. , 1941 .

[66]  Paul Dalsgaard,et al.  Design, recording and verification of a danish emotional speech database , 1997, EUROSPEECH.

[67]  David A. van Leeuwen,et al.  Speech-based recognition of self-reported and observed emotion in a dimensional space , 2012, Speech Commun..

[68]  Zdenek Smekal,et al.  Emotional vocal expressions recognition using the COST 2102 Italian database of emotional speech , 2010 .

[69]  J. Turner Human Emotions: A Sociological Theory , 2007 .

[70]  Tiago H. Falk,et al.  Automatic speech emotion recognition using modulation spectral features , 2011, Speech Commun..

[71]  Robert I. Damper,et al.  Multi-class and hierarchical SVMs for emotion recognition , 2010, INTERSPEECH.

[72]  Roddy Cowie,et al.  Emotional speech: Towards a new generation of databases , 2003, Speech Commun..

[73]  Dimitrios Ververidis,et al.  A State of the Art Review on Emotional Speech Databases , 2003 .

[74]  H. K. Schutte,et al.  On pitch jumps between chest and falsetto registers in voice: data from living and excised human larynges. , 1999, The Journal of the Acoustical Society of America.

[75]  Jon Sánchez,et al.  Automatic emotion recognition using prosodic parameters , 2005, INTERSPEECH.

[76]  Jeong-Sik Park,et al.  On-line speaker adaptation based emotion recognition using incremental emotional information , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[77]  Björn W. Schuller,et al.  Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles , 2005, INTERSPEECH.

[78]  Elliot Moore,et al.  A Preliminary Study on Cross-Databases Emotion Recognition using the Glottal Features in Speech , 2012, INTERSPEECH.

[79]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[80]  P. Shaver,et al.  Emotion knowledge: further exploration of a prototype approach. , 1987, Journal of personality and social psychology.

[81]  Bayya Yegnanarayana,et al.  Naturalistic Audio-Visual Emotion Database , 2014, ICON.

[82]  Quarterly Progress and Status Report A preliminary study of acoustic voice quality correlates , 2007 .

[83]  Roddy Cowie,et al.  Describing the emotional states that are expressed in speech , 2003, Speech Commun..

[84]  Paul Boersma,et al.  Speak and unSpeak with P RAATRAAT , 2002 .

[85]  Paavo Alku,et al.  A toolkit for voice inverse filtering and parametrisation , 2005, INTERSPEECH.

[86]  Bayya Yegnanarayana,et al.  Analysis of excitation source features of speech for emotion recognition , 2015, INTERSPEECH.

[87]  Anna Esposito,et al.  A Speaker Independent Approach to the Classification of Emotional Vocal Expressions , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.

[88]  Paavo Alku,et al.  Perception of emotional valences and activity levels from vowel segments of continuous speech. , 2010, Journal of voice : official journal of the Voice Foundation.

[89]  K. Stevens,et al.  Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[90]  Sheldon B. Michaels,et al.  Some Aspects of Fundamental Frequency and Envelope Amplitude as Related to the Emotional Content of Speech , 1962 .

[91]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[92]  Björn W. Schuller,et al.  Emotion recognition from speech: Putting ASR in the loop , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[93]  Cecile Pereira DIMENSIONS OF EMOTIONAL MEANING IN SPEECH , 2000 .

[94]  M. Rothenberg A new inverse-filtering technique for deriving the glottal air flow waveform during voicing. , 1970, The Journal of the Acoustical Society of America.

[95]  Ruili Wang,et al.  Ensemble methods for spoken emotion recognition in call-centres , 2007, Speech Commun..

[96]  Colleen Richey,et al.  Emotion detection in speech using deep networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[97]  Stefan Steidl,et al.  Automatic classification of emotion related user states in spontaneous children's speech , 2009 .

[98]  Rui Xia,et al.  A preliminary study of cross-lingual emotion recognition from speech: automatic classification versus human perception , 2013, INTERSPEECH.

[99]  I. Fónagy,et al.  Emotional Patterns in Intonation and Music , 1963 .

[100]  Werner Verhelst,et al.  An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech , 2007, Speech Commun..

[101]  P. Ekman An argument for basic emotions , 1992 .

[102]  Paavo Alku,et al.  Emotions in Short Vowel Segments: Effects of the Glottal Flow as Reflected by the Normalized Amplitude Quotient , 2004, ADS.

[103]  Laurence Devillers,et al.  Analysis of Anger across several agent-customer interactions in French call centers , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[104]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[105]  Anne Lacheret,et al.  The role of voice quality and prosodic contour in affective speech perception , 2012, Speech Commun..

[106]  John R. Hershey,et al.  Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[107]  D G Childers,et al.  Vocal quality factors: analysis, synthesis, and perception. , 1991, The Journal of the Acoustical Society of America.

[108]  Shashidhar G. Koolagudi,et al.  IITKGP-SESC: Speech Database for Emotion Analysis , 2009, IC3.

[109]  Roddy Cowie,et al.  Automatic recognition of emotion from voice: a rough benchmark , 2000 .

[110]  Björn Schuller,et al.  Cross-Corpus Classification of Realistic Emotions - Some Pilot Experiments , 2010, LREC 2010.

[111]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .

[112]  Rosalind W. Picard,et al.  Recognizing affect from speech prosody using hierarchical graphical models , 2011, Speech Commun..

[113]  Jean Vroomen,et al.  Duration and intonation in emotional speech , 1993, EUROSPEECH.

[114]  Chung-Hsien Wu,et al.  Emotion recognition of conversational affective speech using temporal course modeling-based error weighted cross-correlation model , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[115]  H. Schlosberg Three dimensions of emotion. , 1954, Psychological review.

[116]  Donna Erickson,et al.  Exploratory Study of Some Acoustic and Articulatory Characteristics of Sad Speech , 2006, Phonetica.

[117]  David Philippou-Hübner,et al.  Determining optimal features for emotion recognition from speech by applying an evolutionary algorithm , 2010, INTERSPEECH.

[118]  Ibon Saratxaga,et al.  Emotion Conversion Based on Prosodic Unit Selection , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[119]  J. Laver,et al.  Voice quality and indexical information. , 1968, The British journal of disorders of communication.

[120]  S. R. Mahadeva Prasanna,et al.  Analysis of excitation source information in emotional speech , 2010, INTERSPEECH.

[121]  Nicholas B. Allen,et al.  On the importance of glottal flow spectral energy for the recognition of emotions in speech , 2010, INTERSPEECH.

[122]  David Philippou-Hübner,et al.  Vowels Formants Analysis Allows Straightforward Detection of High Arousal Acted and Spontaneous Emotions , 2011, INTERSPEECH.

[123]  Marc Schröder,et al.  Emotional speech synthesis: a review , 2001, INTERSPEECH.

[124]  Shrikanth S. Narayanan,et al.  An exploratory study of manifolds of emotional speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[125]  John H. L. Hansen,et al.  Feature analysis and neural network-based classification of speech under stress , 1996, IEEE Trans. Speech Audio Process..

[126]  Björn W. Schuller,et al.  Meta-classifiers in acoustic and linguistic feature fusion-based affect recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[127]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..