Review on Speech Emotion Recognition

This paper surveys the state of the art of speech emotion recognition(SER), and presents an outlook on the trend of future SER technology. First, the survey summarizes and analyzes SER in detail from five perspectives, including emotion representation models, representative emotional speech corpora, emotion-related acoustic features extraction, SER methods and applications. Then, based on the survey, the challenges faced by current SER research are concluded. This paper aims to take a deep insight into the mainstream methods and recent progress in this field, and presents detailed comparison and analysis between these methods.

[1]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[2]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[3]  M P Gelfer,et al.  Comparisons of jitter, shimmer, and signal-to-noise ratio from directly digitized versus taped voice samples. , 1995, Journal of voice : official journal of the Voice Foundation.

[4]  John H. L. Hansen,et al.  A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..

[5]  Qu Li,et al.  A GMM based 2-stage architecture for multi-subject emotion recognition using physiological responses , 2010, AH.

[6]  K. Scherer,et al.  Effect of experimentally induced stress on vocal parameters. , 1986, Journal of experimental psychology. Human perception and performance.

[7]  Longbiao Wang,et al.  Speaker identification by combining MFCC and phase information in noisy environments , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Elliot Moore,et al.  Investigating glottal parameters for differentiating emotional categories with similar prosodics , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Björn W. Schuller,et al.  Learning with synthesized speech for automatic emotion recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Björn W. Schuller,et al.  AVEC 2011-The First International Audio/Visual Emotion Challenge , 2011, ACII.

[11]  A. Ortony,et al.  What's basic about basic emotions? , 1990, Psychological review.

[12]  Athanasia Zlatintsi,et al.  A supervised approach to movie emotion tracking , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Ragini Verma,et al.  Class-level spectral features for emotion recognition , 2010, Speech Commun..

[14]  Carlos Busso,et al.  Emotion recognition using a hierarchical binary decision tree approach , 2011, Speech Commun..

[15]  Björn W. Schuller,et al.  AVEC 2012: the continuous audio/visual emotion challenge , 2012, ICMI '12.

[16]  Bin Yang,et al.  Combining classifiers with diverse feature sets for robust speaker independent emotion recognition , 2009, 2009 17th European Signal Processing Conference.

[17]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[18]  Climent Nadeu,et al.  Linear prediction of the one-sided autocorrelation sequence for noisy speech recognition , 1997, IEEE Trans. Speech Audio Process..

[19]  B. Schuller,et al.  Towards intuitive speech interaction by the integration of emotional aspects , 2002, IEEE International Conference on Systems, Man and Cybernetics.

[20]  Maja Pantic,et al.  The SEMAINE corpus of emotionally coloured character interactions , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[21]  Dongrui Wu,et al.  Speech emotion estimation in 3D space , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[22]  Sergios Theodoridis,et al.  A dimensional approach to emotion recognition of speech from movies , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Nikos A. Vlassis,et al.  A Greedy EM Algorithm for Gaussian Mixture Learning , 2002, Neural Processing Letters.

[24]  Shashidhar G. Koolagudi,et al.  Text Independent Emotion Recognition Using Spectral Features , 2011, IC3.

[25]  Björn W. Schuller,et al.  On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues , 2009, Journal on Multimodal User Interfaces.

[26]  Tapio Seppänen,et al.  Prosody-based classification of emotions in spoken finnish , 2003, INTERSPEECH.

[27]  Wansen Wang,et al.  Notice of RetractionEmotion recognition based on CSO&SVM in e-learning , 2011, 2011 Seventh International Conference on Natural Computation.

[28]  Dimitra Vergyri,et al.  Using Prosodic and Spectral Features in Detecting Depression in Elderly Males , 2011, INTERSPEECH.

[29]  Tiago H. Falk,et al.  Automatic speech emotion recognition using modulation spectral features , 2011, Speech Commun..

[30]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[31]  Qi Luo,et al.  Study on Speech Emotion Recognition System in E-Learning , 2007, HCI.

[32]  Nikos A. Vlassis,et al.  A kurtosis-based dynamic approach to Gaussian mixture modeling , 1999, IEEE Trans. Syst. Man Cybern. Part A.

[33]  Erik Marchi,et al.  Emotion in the speech of children with autism spectrum conditions: prosody and everything else , 2012, WOCCI.

[34]  Zhigang Deng,et al.  Emotion recognition based on phoneme classes , 2004, INTERSPEECH.

[35]  Ling Guan,et al.  A neural network approach for human emotion recognition in speech , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[36]  Cynthia Breazeal,et al.  Recognition of Affective Communicative Intent in Robot-Directed Speech , 2002, Auton. Robots.

[37]  Douglas D. O'Shaughnessy,et al.  Invited paper: Automatic speech recognition: History, methods and challenges , 2008, Pattern Recognit..

[38]  Longbiao Wang,et al.  Speaker Identification and Verification by Combining MFCC and Phase Information , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[39]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[40]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .

[41]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[42]  Jon Sánchez,et al.  Automatic emotion recognition using prosodic parameters , 2005, INTERSPEECH.

[43]  Tsuyoshi Moriyama,et al.  Emotion recognition and synthesis system on speech , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[44]  Thomas S. Huang,et al.  Emotion recognition from speech VIA boosted Gaussian mixture models , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[45]  Robert I. Damper,et al.  Multi-class and hierarchical SVMs for emotion recognition , 2010, INTERSPEECH.

[46]  Valery A. Petrushin,et al.  Emotion recognition in speech signal: experimental study, development, and application , 2000, INTERSPEECH.

[47]  Bin Yang,et al.  Psychological Motivated Multi-Stage Emotion Classification Exploiting Voice Quality Features , 2008 .

[48]  Ryohei Nakatsu,et al.  Emotion Recognition in Speech Using Neural Networks , 2000, Neural Computing & Applications.

[49]  John H. L. Hansen,et al.  Analysis and detection of cognitive load and frustration in drivers' speech , 2010, INTERSPEECH.

[50]  Dimitrios Ververidis,et al.  A State of the Art Review on Emotional Speech Databases , 2003 .

[51]  Shrikanth S. Narayanan,et al.  Support Vector Regression for Automatic Recognition of Spontaneous Emotions in Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[52]  Theodoros Iliou,et al.  Statistical Evaluation of Speech Features for Emotion Recognition , 2009, 2009 Fourth International Conference on Digital Telecommunications.

[53]  Ailbhe Ní Chasaide,et al.  The role of voice quality in communicating emotion, mood and attitude , 2003, Speech Commun..

[54]  K. Kroschel,et al.  Evaluation of natural emotions using self assessment manikins , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[55]  Luis Villaseñor Pineda,et al.  Features selection for primitives estimation on emotional speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[56]  Ken Chen,et al.  Research on Speech Emotion Recognition System in E-Learning , 2007, International Conference on Computational Science.

[57]  Rui Xia,et al.  Using i-Vector Space Model for Emotion Recognition , 2012, INTERSPEECH.

[58]  Jacob Benesty,et al.  Springer handbook of speech processing , 2007, Springer Handbooks.

[59]  Björn W. Schuller,et al.  Hidden Markov model-based speech emotion recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[60]  Björn W. Schuller,et al.  Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[61]  Xi Li,et al.  Stress and Emotion Classification using Jitter and Shimmer Features , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[62]  Yongzhao Zhan,et al.  Adaptive and Optimal Classification of Speech Emotion Recognition , 2008, 2008 Fourth International Conference on Natural Computation.

[63]  T. Dalgleish,et al.  Handbook of cognition and emotion , 1999 .

[64]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[65]  M. Borchert,et al.  Emotions in speech - experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[66]  R. van Bezooijen,et al.  Recognition of Vocal Expressions of Emotion , 1983 .

[67]  D. Mitchell Wilkes,et al.  Acoustical properties of speech as indicators of depression and suicidal risk , 2000, IEEE Transactions on Biomedical Engineering.

[68]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[69]  Jan Larsen,et al.  Combining semantic and acoustic features for valence and arousal recognition in speech , 2012, 2012 3rd International Workshop on Cognitive Information Processing (CIP).

[70]  Roddy Cowie,et al.  Automatic recognition of emotion from voice: a rough benchmark , 2000 .

[71]  Bin Yang,et al.  The Relevance of Voice Quality Features in Speaker Independent Emotion Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[72]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[73]  Say Wei Foo,et al.  Speech emotion recognition using hidden Markov models , 2003, Speech Commun..