Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011

Speaker emotion recognition is achieved through processing methods that include isolation of the speech signal and extraction of selected features for the final classification. In terms of acoustics, speech processing techniques offer extremely valuable paralinguistic information derived mainly from prosodic and spectral features. In some cases, the process is assisted by speech recognition systems, which contribute to the classification using linguistic information. Both frameworks deal with a very challenging problem, as emotional states do not have clear-cut boundaries and often differ from person to person. In this article, research papers that investigate emotion recognition from audio channels are surveyed and classified, based mostly on extracted and selected features and their classification methodology. Important topics from different classification techniques, such as databases available for experimentation, appropriate feature extraction and selection methods, classifiers and performance issues are discussed, with emphasis on research published in the last decade. This survey also provides a discussion on open trends, along with directions for future research on this topic.

[1]  Björn W. Schuller,et al.  “The Godfather” vs. “Chaos”: Comparing Linguistic Analysis Based on On-line Knowledge Sources and Bags-of-N-Grams for Movie Review Valence Estimation , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[2]  Yongzhao Zhan,et al.  Adaptive and Optimal Classification of Speech Emotion Recognition , 2008, 2008 Fourth International Conference on Natural Computation.

[3]  A. Young,et al.  Neuropsychology of fear and loathing , 2001 .

[4]  Chun Chen,et al.  Emotional Speech Analysis on Nonlinear Manifold , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[5]  Björn W. Schuller,et al.  Emotion recognition from speech: Putting ASR in the loop , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Alan Hanjalic,et al.  Affective video content representation and modeling , 2005, IEEE Transactions on Multimedia.

[8]  Chung-Hsien Wu,et al.  Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels , 2015, IEEE Transactions on Affective Computing.

[9]  Theodoros Iliou,et al.  Towards Emotion Recognition from Speech: Definition, Problems and the Materials of Research , 2010, Semantics in Adaptive and Personalized Services.

[10]  Zhigang Deng,et al.  Emotion recognition based on phoneme classes , 2004, INTERSPEECH.

[11]  Loïc Kessous,et al.  The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals , 2007, INTERSPEECH.

[12]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[13]  S. Lalitha,et al.  Speech emotion recognition , 2014, 2014 International Conference on Advances in Electronics Computers and Communications.

[14]  Björn W. Schuller,et al.  Frame vs. Turn-Level: Emotion Recognition from Speech Considering Static and Dynamic Processing , 2007, ACII.

[15]  Lijiang Chen,et al.  Speaker Independent Emotion Recognition Using HMMs Fusion System with Relative Features , 2008, 2008 First International Conference on Intelligent Networks and Intelligent Systems.

[16]  Roddy Cowie,et al.  Emotional speech: Towards a new generation of databases , 2003, Speech Commun..

[17]  Li Zhao,et al.  Study on the Emotion Recognition of Whispered Speech , 2009, 2009 WRI Global Congress on Intelligent Systems.

[18]  Yonghong Yan,et al.  Applying Articulatory Features to Speech Emotion Recognition , 2009, 2009 International Conference on Research Challenges in Computer Science.

[19]  Guo Chunyu,et al.  A Hybrid Speech Emotion Perception Method of VQ-based Feature Processing and ANN Recognition , 2009, 2009 WRI Global Congress on Intelligent Systems.

[20]  Guisong Liu,et al.  Study to Speech Emotion Recognition Based on TWINsSVM , 2009, 2009 Fifth International Conference on Natural Computation.

[21]  Lijiang Chen,et al.  Multi-level Speech Emotion Recognition Based on HMM and ANN , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[22]  K. Scherer,et al.  The World of Emotions is not Two-Dimensional , 2007, Psychological science.

[23]  Andrew Ortony,et al.  The Cognitive Structure of Emotions , 1988 .

[24]  Anna Wierzbicka,et al.  Emotions Across Languages and Cultures: Diversity and Universals: Emotional universals , 1999 .

[25]  B. Schuller,et al.  The Role of Prosody in Affective Speech, Linguistic Insights, Studies in Language and Communication , 2009 .

[26]  Wee Ser,et al.  Speech Emotion Recognition Using Canonical Correlation Analysis and Probabilistic Neural Network , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[27]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[28]  Elisabeth André,et al.  Comparing Feature Sets for Acted and Spontaneous Speech in View of Automatic Emotion Recognition , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[29]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .

[30]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[31]  Christos-Nikolaos Anagnostopoulos,et al.  Sound Processing Features for Speaker-Dependent and Phrase-Independent Emotion Recognition in Berlin Database , 2008, ISD.

[32]  Jianing Tong,et al.  Speech Emotion Recognition Based on Principal Component Analysis and Back Propagation Neural Network , 2010, 2010 International Conference on Measuring Technology and Mechatronics Automation.

[33]  Tsang-Long Pao,et al.  Comparison of Several Classifiers for Emotion Recognition from Noisy Mandarin Speech , 2007, Third International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2007).

[34]  Chung-Hsien Wu,et al.  Emotion recognition from text using semantic labels and separable mixture models , 2006, TALIP.

[35]  Fukun Bi,et al.  Emotion Statuses Recognition of Speech Signal Using Intuitionistic Fuzzy Set , 2009, 2009 WRI World Congress on Software Engineering.

[36]  Björn W. Schuller,et al.  Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles , 2005, INTERSPEECH.

[37]  Björn W. Schuller,et al.  Hidden Markov model-based speech emotion recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[38]  Mohamed S. Kamel,et al.  Segment-based approach to the recognition of emotions in speech , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[39]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[40]  Marc Schröder,et al.  Experimental study of affect bursts , 2003, Speech Commun..

[41]  Björn W. Schuller,et al.  Deep neural networks for acoustic emotion recognition: Raising the benchmarks , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42]  Roddy Cowie,et al.  ASR for emotional speech: Clarifying the issues and enhancing performance , 2005, Neural Networks.

[43]  Andreas Stolcke,et al.  Combining Prosodic Lexical and Cepstral Systems for Deceptive Speech Detection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[44]  Anna Esposito,et al.  A Speaker Independent Approach to the Classification of Emotional Vocal Expressions , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.

[45]  Björn W. Schuller,et al.  Evolutionary Feature Generation in Speech Emotion Recognition , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[46]  Diane J. Litman,et al.  Predicting Emotion in Spoken Dialogue from Multiple Knowledge Sources , 2004, NAACL.

[47]  David H. Evans,et al.  Detection of cough signals in continuous audio recordings using hidden Markov models , 2006, IEEE Transactions on Biomedical Engineering.

[48]  Ananth N. Iyer,et al.  Emotion Detection From Infant Facial Expressions And Cries , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[49]  Gerhard Rigoll,et al.  Bimodal fusion of emotional data in an automotive environment , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[50]  Shrikanth S. Narayanan,et al.  Combining acoustic and language information for emotion recognition , 2002, INTERSPEECH.

[51]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[52]  Lijiang Chen,et al.  Relative Speech Emotion Recognition Based Artificial Neural Network , 2008, 2008 IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application.

[53]  M. Landau Acoustical Properties of Speech as Indicators of Depression and Suicidal Risk , 2008 .

[54]  P. Babu Anto,et al.  Speaker Independent Automatic Emotion Recognition from Speech: A Comparison of MFCCs and Discrete Wavelet Transforms , 2009, 2009 International Conference on Advances in Recent Technologies in Communication and Computing.

[55]  Lukás Burget,et al.  Brno University of Technology system for Interspeech 2009 emotion challenge , 2009, INTERSPEECH.

[56]  Björn W. Schuller,et al.  Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[57]  D. Mitchell Wilkes,et al.  Acoustical properties of speech as indicators of depression and suicidal risk , 2000, IEEE Transactions on Biomedical Engineering.

[58]  Theodoros Kostoulas,et al.  Enhancing Emotion Recognition from Speech through Feature Selection , 2010, TSD.

[59]  Rosalind W. Picard,et al.  Modeling drivers' speech under stress , 2003, Speech Commun..

[60]  Ruili Wang,et al.  Ensemble methods for spoken emotion recognition in call-centres , 2007, Speech Commun..

[61]  Johannes Wagner,et al.  From Physiological Signals to Emotions: Implementing and Comparing Selected Methods for Feature Extraction and Classification , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[62]  Tsang-Long Pao,et al.  Combination of Multiple Classifiers for Improving Emotion Recognition in Mandarin Speech , 2007, Third International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2007).

[63]  Johannes K. Chiang Gestalt of an Approach for Multidimensional Data Mining on Concept Taxonomy Forest to Discover Association Patterns with various Data Granularities , 2007 .

[64]  Wang Yu,et al.  Research and Implementation of Emotional Feature Classification and Recognition in Speech Signal , 2008, 2008 International Symposium on Intelligent Information Technology Application Workshops.

[65]  Maja Pantic,et al.  Audiovisual discrimination between laughter and speech , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[66]  Takashi Nose,et al.  Emotional speech recognition based on style estimation and adaptation with multiple-regression HMM , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[67]  Chellu Chandra Sekhar,et al.  Variational Gaussian Mixture Models for Speech Emotion Recognition , 2009, 2009 Seventh International Conference on Advances in Pattern Recognition.

[68]  Martin Aigner,et al.  Cognitive and emotion recognition deficits in obsessive–compulsive disorder , 2007, Psychiatry Research.

[69]  G. A. Mendelsohn,et al.  Affect grid : A single-item scale of pleasure and arousal , 1989 .

[70]  Yonghong Yan,et al.  Emotion Recognition and Conversion for Mandarin Speech , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[71]  Li Zhao,et al.  A Study on Emotional Feature Analysis and Recognition in Speech Signal , 2009, 2009 International Conference on Measuring Technology and Mechatronics Automation.

[72]  Inma Hernáez,et al.  Feature Analysis and Evaluation for Automatic Emotion Identification in Speech , 2010, IEEE Transactions on Multimedia.

[73]  Pierre Dumouchel,et al.  Cepstral and long-term features for emotion recognition , 2009, INTERSPEECH.

[74]  Carlos Busso,et al.  Emotion recognition using a hierarchical binary decision tree approach , 2011, Speech Commun..

[75]  Björn W. Schuller,et al.  Meta-classifiers in acoustic and linguistic feature fusion-based affect recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[76]  Chang Dong Yoo,et al.  Speech emotion recognition via a max-margin framework incorporating a loss function based on the Watson and Tellegen's emotion model , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[77]  Jiucang Hao,et al.  Emotion recognition by speech signals , 2003, INTERSPEECH.

[78]  Elisabeth André,et al.  Improving Automatic Emotion Recognition from Speech via Gender Differentiaion , 2006, LREC.

[79]  Say Wei Foo,et al.  Classification of stress in speech using linear and nonlinear features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[80]  Marko Lugger,et al.  AN INCREMENTAL ANALYSIS OF DIFFERENT FEATURE GROUPS IN SPEAKER INDEPENDENT EMOTION RECOGNITION , 2007 .

[81]  B. P. Bogert,et al.  The quefrency analysis of time series for echoes : cepstrum, pseudo-autocovariance, cross-cepstrum and saphe cracking , 1963 .

[82]  Diane J. Litman,et al.  Predicting Student Emotions in Computer-Human Tutoring Dialogues , 2004, ACL.

[83]  Kornel Laskowski,et al.  Combining Efforts for Improving Automatic Classification of Emotional User States , 2006 .

[84]  Inma Hernáez,et al.  An objective and subjective study of the role of semantics and prosodic features in building corpora for emotional TTS , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[85]  Roddy Cowie,et al.  Beyond emotion archetypes: Databases for emotion modelling using neural networks , 2005, Neural Networks.

[86]  Yang Yong,et al.  Speech emotion recognition based on MFCC , 2008 .

[87]  Sergios Theodoridis,et al.  A dimensional approach to emotion recognition of speech from movies , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[88]  L. Lamel,et al.  Emotion detection in task-oriented spoken dialogues , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[89]  Theodoros Iliou,et al.  Comparison of Different Classifiers for Emotion Recognition , 2009, 2009 13th Panhellenic Conference on Informatics.

[90]  Kornel Laskowski,et al.  Emotion recognition in spontaneous speech using GMMs , 2006, INTERSPEECH.

[91]  Elisabeth André,et al.  Exploring the benefits of discretization of acoustic features for speech emotion recognition , 2009, INTERSPEECH.

[92]  Theodoros Kostoulas,et al.  Detection of Negative Emotional States in Real-World Scenario , 2007 .

[93]  Björn Schuller,et al.  Being bored? Recognising natural interest by extensive audiovisual integration for real-life application , 2009, Image Vis. Comput..

[94]  Kostas Karpouzis,et al.  The HUMAINE Database: Addressing the Collection and Annotation of Naturalistic and Induced Emotional Data , 2007, ACII.

[95]  Bin Yang,et al.  The Relevance of Voice Quality Features in Speaker Independent Emotion Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[96]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[97]  Björn W. Schuller,et al.  Speaker Independent Speech Emotion Recognition by Ensemble Classification , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[98]  Björn W. Schuller,et al.  Recognising interest in conversational speech - comparing bag of frames and supra-segmental features , 2009, INTERSPEECH.

[99]  Tiago H. Falk,et al.  Automatic recognition of speech emotion using long-term spectro-temporal features , 2009, 2009 16th International Conference on Digital Signal Processing.

[100]  Yi-Ping Phoebe Chen,et al.  Acoustic Features Extraction for Emotion Recognition , 2007, 6th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2007).

[101]  Zdravko Kacic,et al.  Context-Independent Multilingual Emotion Recognition from Speech Signals , 2003, Int. J. Speech Technol..

[102]  A. Hanjalic,et al.  Extracting moods from pictures and sounds: towards truly personalized TV , 2006, IEEE Signal Processing Magazine.

[103]  Elmar Nöth,et al.  How to find trouble in communication , 2003, Speech Commun..

[104]  J. Russell,et al.  Facial and vocal expressions of emotion. , 2003, Annual review of psychology.