Speech-based emotion classification using multiclass SVM with hybrid kernel and thresholding fusion

Emotion classification is essential for understanding human interactions and hence is a vital component of behavioral studies. Although numerous algorithms have been developed, the emotion classification accuracy is still short of what is desired for the algorithms to be used in real systems. In this paper, we evaluate an approach where basic acoustic features are extracted from speech samples, and the One-Against-All (OAA) Support Vector Machine (SVM) learning algorithm is used. We use a novel hybrid kernel, where we choose the optimal kernel functions for the individual OAA classifiers. Outputs from the OAA classifiers are normalized and combined using a thresholding fusion mechanism to finally classify the emotion. Samples with low `relative confidence' are left as `unclassified' to further improve the classification accuracy. Results show that the decision-level recall of our approach for six-class emotion classification is 80.5%, outperforming a state-of-the-art approach that uses the same dataset.

[1]  Fausto Pedro García Márquez,et al.  Digital Filters And Signal Processing , 2014 .

[2]  Klaus R. Scherer,et al.  Emotion dimensions and formant position , 2009, INTERSPEECH.

[3]  Ragini Verma,et al.  Class-level spectral features for emotion recognition , 2010, Speech Commun..

[4]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[5]  Andreas Utsch,et al.  Two-stream emotion recognition for call center monitoring , 2007, INTERSPEECH.

[6]  M. Bhasin,et al.  Support Vector Machine-based Method for Subcellular Localization of Human Proteins Using Amino Acid Compositions, Their Order, and Similarity Search* , 2005, Journal of Biological Chemistry.

[7]  Christian Igel,et al.  Active learning with support vector machines , 2014, WIREs Data Mining Knowl. Discov..

[8]  Paul Lukowicz,et al.  Activity and emotion recognition to support early diagnosis of psychiatric diseases , 2008, 2008 Second International Conference on Pervasive Computing Technologies for Healthcare.

[9]  Vidhyasaharan Sethu,et al.  Empirical mode decomposition based weighted frequency feature for speech-based emotion classification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[11]  Mireia Farrús,et al.  Histogram Equalization in SVM Multimodal Person Verification , 2007, ICB.

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Mohamed S. Kamel,et al.  Segment-based approach to the recognition of emotions in speech , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[14]  S. Yoo,et al.  Support Vector Machine Based Arrhythmia Classification Using Reduced Features , 2005 .

[15]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[16]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .

[17]  Fadi Al Machot,et al.  A novel real-time emotion detection system from audio streams based on Bayesian Quadratic Discriminate Classifier for ADAS , 2011, Proceedings of the Joint INDS'11 & ISTET'11.