Emotion Recognition from Speech Signals using Excitation Source and Spectral Features

The task of recognition of emotions from speech signals is one that has been going on for a long time. In the previous works, the dominance of prosodic and spectral features have been observed when it comes to recognition of emotions. But a speech signal also consists of Source level information which gets lost during this process. In this work, we have combined several spectral features with several excitation source features to see how well the model can perform the emotion recognition task. For the task in hand we have taken 3 databases namely, Berlin Emotional Database (Berlin Emo-DB), Surrey Audio-Visual Expressed Emotion (SAVEE) Database and Toronto emotional speech set (TESS) Database. The reason behind taking these databases is that the variation they offer is effective to judge the robustness of the recognition model. We chose Sequential Minimal Optimization (SMO)and Random Forest to perform classification.

[1]  Bayya Yegnanarayana,et al.  Combining evidence from residual phase and MFCC features for speaker recognition , 2006, IEEE Signal Processing Letters.

[2]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[3]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[4]  R. Congalton,et al.  Accuracy assessment: a user's perspective , 1986 .

[5]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[6]  Ling Huang,et al.  Feature Extraction of EEG Signals Using Power Spectral Entropy , 2008, 2008 International Conference on BioMedical Engineering and Informatics.

[7]  Dipanjan Nandi,et al.  Language Identification Using Excitation Source Features , 2015 .

[8]  Chen Chu,et al.  Predicting the types of metabolic pathway of compounds using molecular fragments and sequential minimal optimization. , 2016, Combinatorial chemistry & high throughput screening.

[9]  Bayya Yegnanarayana,et al.  Event-Based Instantaneous Fundamental Frequency Estimation From Speech Signals , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[11]  S. R. Mahadeva Prasanna,et al.  Extraction of speaker-specific excitation information from linear prediction residual of speech , 2006, Speech Commun..

[12]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[13]  K S Rao,et al.  Emotion recognition from speech signal using epoch parameters , 2010, 2010 International Conference on Signal Processing and Communications (SPCOM).

[14]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech using source, system, and prosodic features , 2012, Int. J. Speech Technol..

[15]  Theodoros Giannakopoulos,et al.  Introduction to Audio Analysis: A MATLAB® Approach , 2014 .

[16]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[17]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[18]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[19]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[20]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  Rong Tong,et al.  Chinese Dialect Identification Using Tone Features Based on Pitch Flux , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[23]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.