Efficient speech emotion recognition using binary support vector machines & multiclass SVM

This paper presents the construction of Binary Support Vector Machines and its significance for efficient Speech Emotion Recognition (SER). German Emotional Speech Corpus EmoDB has been used in this study. Seven Binary Support Vector Machines (SVMs) corresponding to each of the seven emotions in the EmoDB, namely Anger-Not Anger, Boredom-Not Boredom, Disgust-Not Disgust, Fear-Not Fear, Happy-Not Happy, Sad-Not Sad and Neutral-Not Neutral are constructed. Features are selected for these seven Binary SVMs using Correlation Based Feature Selection (CFS) with Sequential Forward Selection (SFS). One Multiclass SVM is also constructed. Ten fold cross validation has been used and achieved an average accuracy of 95.32% for the Binary SVMs and 62.85% for the Multiclass SVM. The seven Binary SVMs and the Multiclass SVM are fused together using a combinator algorithm. All the SVMs are run in parallel by giving the SVM specific features as input. Fused model produced an average accuracy of 92.25% for the Binary SVMs and 77.07% for the Multiclass SVM on the test set. On the same test set using the combinator algorithm, the fused model has achieved an overall accuracy of 87.86% which is a significant improvement over the accuracies achieved in the previous studies.

[1]  Wee Ser,et al.  A Hybrid PNN-GMM classification scheme for speech emotion recognition , 2008, 2008 19th International Conference on Pattern Recognition.

[2]  I. Song,et al.  Working Set Selection Using Second Order Information for Training Svm, " Complexity-reduced Scheme for Feature Extraction with Linear Discriminant Analysis , 2022 .

[3]  Athar Yawar,et al.  Icarus , 2017, The Lancet.

[4]  M. Landau Acoustical Properties of Speech as Indicators of Depression and Suicidal Risk , 2008 .

[5]  J. O'connor Intonation Of Colloquial English , 1961 .

[6]  Xirong Li,et al.  Speech emotion classification using acoustic features , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[7]  Maja J. Mataric,et al.  A Framework for Automatic Human Emotion Classification Using Emotion Profiles , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Maria Schubiger English intonation, its form and function , 1958 .

[9]  D. Mitchell Wilkes,et al.  Acoustical properties of speech as indicators of depression and suicidal risk , 2000, IEEE Transactions on Biomedical Engineering.

[10]  Björn W. Schuller,et al.  Hidden Markov model-based speech emotion recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[11]  Chung-Hsien Wu,et al.  Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels , 2015, IEEE Transactions on Affective Computing.

[12]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[13]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[14]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[15]  Y. D. Chavhan,et al.  Speech emotion recognition using RBF kernel of LIBSVM , 2015, 2015 2nd International Conference on Electronics and Communication Systems (ICECS).

[16]  Jianhua Ma,et al.  Ubiquitous Intelligence and Computing, Third International Conference, UIC 2006, Wuhan, China, September 3-6, 2006, Proceedings , 2006, UIC.

[17]  John H. L. Hansen,et al.  ICARUS: Source generator based real-time recognition of speech in noisy stressful and Lombard effect environments , 1995, Speech Commun..

[18]  Ling Guan,et al.  A neural network approach for human emotion recognition in speech , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[19]  Peter Robinson,et al.  Real-Time Recognition of Affective States from Nonverbal Features of Speech and Its Application for Public Speaking Skill Analysis , 2011, IEEE Transactions on Affective Computing.

[20]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[21]  Björn W. Schuller,et al.  Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Lloyd A. Smith,et al.  Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper , 1999, FLAIRS.

[23]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[24]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[25]  Halis Altun,et al.  New Frameworks to Boost Feature Selection Algorithms in Emotion Detection for Improved Human-Computer Interaction , 2007, BVAI.