Spoken emotion recognition using hierarchical classifiers

The recognition of the emotional state of speakers is a multi-disciplinary research area that has received great interest over the last years. One of the most important goals is to improve the voice-based human-machine interactions. Several works on this domain use the prosodic features or the spectrum characteristics of speech signal, with neural networks, Gaussian mixtures and other standard classifiers. Usually, there is no acoustic interpretation of types of errors in the results. In this paper, the spectral characteristics of emotional signals are used in order to group emotions based on acoustic rather than psychological considerations. Standard classifiers based on Gaussian Mixture Models, Hidden Markov Models and Multilayer Perceptron are tested. These classifiers have been evaluated with different configurations and input features, in order to design a new hierarchical method for emotion classification. The proposed multiple feature hierarchical method for seven emotions, based on spectral and prosodic information, improves the performance over the standard classifiers and the fixed features.

[1]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[2]  João Paulo Papa,et al.  Spoken emotion recognition through optimum-path forest classification using glottal features , 2010, Comput. Speech Lang..

[3]  Chloé Clavel,et al.  Fear-type emotion recognition for future audio-based surveillance systems , 2008, Speech Commun..

[4]  Huan Liu,et al.  Book review: Machine Learning, Neural and Statistical Classification Edited by D. Michie, D.J. Spiegelhalter and C.C. Taylor (Ellis Horwood Limited, 1994) , 1996, SGAR.

[5]  Shrikanth S. Narayanan,et al.  Detecting emotional state of a child in a conversational computer game , 2011, Comput. Speech Lang..

[6]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[7]  Luc Van Gool,et al.  Recognizing emotions expressed by body pose: A biologically inspired neural model , 2008, Neural Networks.

[8]  K. Scherer,et al.  Appraisal processes in emotion: Theory, methods, research. , 2001 .

[9]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[10]  O. Mayora,et al.  Activity and emotion recognition to support early diagnosis of psychiatric diseases , 2008, Pervasive 2008.

[11]  Lijiang Chen,et al.  Speaker independent emotion recognition based on SVM/HMMS fusion system , 2008, 2008 International Conference on Audio, Language and Image Processing.

[12]  Gang Wei,et al.  Speech emotion recognition based on HMM and SVM , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[13]  Loïc Kessous,et al.  Whodunnit - Searching for the most important feature types signalling emotion-related user states in speech , 2011, Comput. Speech Lang..

[14]  Yang Li,et al.  Recognizing emotions in speech using short-term and long-term features , 1998, ICSLP.

[15]  David A. van Leeuwen,et al.  Automatic discrimination between laughter and speech , 2007, Speech Commun..

[16]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[17]  David Escudero Mancebo,et al.  Analysis of prosodic features towards modelling of emotional and pragmatic attributes of speech , 2005, Proces. del Leng. Natural.

[18]  Carlos Busso,et al.  Emotion recognition using a hierarchical binary decision tree approach , 2011, Speech Commun..

[19]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[20]  J. Nazuno Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .

[21]  Fakhri Karray,et al.  Speech Emotion Recognition using Gaussian Mixture Vector Autoregressive Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[22]  Elisabeth André,et al.  Emotion recognition based on physiological changes in music listening , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[24]  Shashidhar G. Koolagudi,et al.  Text Independent Emotion Recognition Using Spectral Features , 2011, IC3.

[25]  Jon Sánchez,et al.  Reconocimiento automático de emociones utilizando parámetros prosódicos , 2005, Proces. del Leng. Natural.

[26]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[27]  Dylan Evans,et al.  Emotion, Evolution and Rationality , 2004 .

[28]  J. Prinz Which Emotions Are Basic , 2007 .

[29]  Björn W. Schuller,et al.  Combining speech recognition and acoustic word emotion models for robust text-independent emotion recognition , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[30]  John L. Arnott,et al.  Applying an analysis of acted vocal emotions to improve the simulation of synthetic speech , 2008, Comput. Speech Lang..

[31]  Mark Steedman Proceedings of International Symposium on Spoken Dialogue, International Conference on Spoken Language Processing (held in conjunction with ICSLP-96) , 1996 .

[32]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[33]  Luís Paulo Reis,et al.  Dynamic Multimedia Content Delivery based on Real-time user Emotions - Multichannel Online Biosignals Towards Adaptative GUI and Content Delivery , 2009, BIOSIGNALS.

[34]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[35]  Yi-Ping Phoebe Chen,et al.  Acoustic Features Extraction for Emotion Recognition , 2007, 6th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2007).

[36]  Eva Navas Cordón,et al.  Reconocimiento automático de emociones utilizando parámetros prosódicos , 2005 .

[38]  Ruili Wang,et al.  Ensemble methods for spoken emotion recognition in call-centres , 2007, Speech Commun..

[39]  M. Borchert,et al.  Emotions in speech - experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[40]  Kristian Kroschel,et al.  Robust Speech Recognition and Understanding , 2007 .

[41]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[42]  Bin Yang,et al.  Emotion recognition from speech signals using new harmony features , 2010, Signal Process..

[43]  K. Scherer What are emotions? And how can they be measured? , 2005 .

[44]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[45]  Christian Müller Speaker Classification II, Selected Projects , 2007, Speaker Classification.

[46]  Björn W. Schuller,et al.  Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[47]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[48]  P. Ekman,et al.  Pan-Cultural Elements in Facial Displays of Emotion , 1969, Science.

[49]  Roddy Cowie,et al.  Describing the emotional states that are expressed in speech , 2003, Speech Commun..

[50]  Emmanuel Dellandréa,et al.  Recognition of emotions in speech by a hierarchical approach , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[51]  Johannes Wagner,et al.  A Systematic Comparison of Different HMM Designs for Emotion Recognition from Acted and Spontaneous Speech , 2007, ACII.