Class-specific multiple classifiers scheme to recognize emotions from speech signals

The emotion recognition performances of AR parameters of different orders are investigated.AR reflection coefficients recognize emotions better than LPC.A new class-specific multiple classifiers scheme is proposed for speech emotion recognition.The proposed method utilizes a feature vector and a classifier for each emotion.The class-specific multiple classifiers scheme improves the recognition accuracy. Automatic emotion recognition from speech signals is one of the important research areas, which adds value to machine intelligence. Pitch, duration, energy and Mel-frequency cepstral coefficients (MFCC) are the widely used features in the field of speech emotion recognition. A single classifier or a combination of classifiers is used to recognize emotions from the input features. The present work investigates the performance of the features of Autoregressive (AR) parameters, which include gain and reflection coefficients, in addition to the traditional linear prediction coefficients (LPC), to recognize emotions from speech signals. The classification performance of the features of AR parameters is studied using discriminant, k-nearest neighbor (KNN), Gaussian mixture model (GMM), back propagation artificial neural network (ANN) and support vector machine (SVM) classifiers and we find that the features of reflection coefficients recognize emotions better than the LPC. To improve the emotion recognition accuracy, we propose a class-specific multiple classifiers scheme, which is designed by multiple parallel classifiers, each of which is optimized to a class. Each classifier for an emotional class is built by a feature identified from a pool of features and a classifier identified from a pool of classifiers that optimize the recognition of the particular emotion. The outputs of the classifiers are combined by a decision level fusion technique. The experimental results show that the proposed scheme improves the emotion recognition accuracy. Further improvement in recognition accuracy is obtained when the scheme is built by including MFCC features in the pool of features.

[1]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[2]  Bin Yang,et al.  Emotion recognition from speech signals using new harmony features , 2010, Signal Process..

[3]  Stacy Marsella,et al.  Modeling the cognitive antecedents and consequences of emotion , 2009, Cognitive Systems Research.

[4]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[5]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[6]  Ramón López-Cózar,et al.  Enhancement of emotion detection in spoken dialogue systems by combining several information sources , 2011, Speech Commun..

[7]  Halis Altun,et al.  Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection , 2009, Expert Syst. Appl..

[8]  John H. L. Hansen,et al.  A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..

[9]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[10]  Ruili Wang,et al.  Ensemble methods for spoken emotion recognition in call-centres , 2007, Speech Commun..

[11]  Amitava Chatterjee,et al.  Support vector machines employing cross-correlation for emotional speech recognition , 2009 .

[12]  Ragini Verma,et al.  Class-level spectral features for emotion recognition , 2010, Speech Commun..

[13]  Chloé Clavel,et al.  Fear-type emotion recognition for future audio-based surveillance systems , 2008, Speech Commun..

[14]  Werner Verhelst,et al.  An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech , 2007, Speech Commun..

[15]  Ryohei Nakatsu,et al.  Emotion Recognition in Speech Using Neural Networks , 2000, Neural Computing & Applications.

[16]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[17]  James C. Bezdek,et al.  Decision templates for multiple classifier fusion: an experimental comparison , 2001, Pattern Recognit..

[18]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[19]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[20]  Shrikanth S. Narayanan,et al.  Primitives-based evaluation and estimation of emotions in speech , 2007, Speech Commun..

[21]  Carlos Busso,et al.  Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[23]  Diego H. Milone,et al.  Spoken emotion recognition using hierarchical classifiers , 2011, Comput. Speech Lang..

[24]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[25]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[26]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[27]  Elmar Nöth,et al.  We are not amused - but how do you know? user states in a multi-modal dialogue system , 2003, INTERSPEECH.

[28]  Björn W. Schuller,et al.  The hinterland of emotions: Facing the open-microphone challenge , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[29]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.