Emotion Recognition from Speech using Discriminative Features

ABSTRACT Creating an accurate Speech Emotion Recognition (SER) system depends on extracting features relevant to that of emotions from speech. In this paper, the features that are extracted from the speech samples include Mel Frequency Cepstral Coefficients (MFCC), energy, pitch, spectral flux, spectral roll-off and spectral stationarity. In order to avoid the ‘curse of dimensionality’, statistical parameters, i.e. mean, variance, median, maximum, minimum, and index of dispersion have been applied on the extracted features. For classifying the emotion in an unknown test sample, Support Vector Machines (SVM) has been chosen due to its proven efficiency. Through experimentation on the chosen features, an average classification accuracy of 86.6% has been achieved using one-v/s-all multi-class SVM which is further improved to 100% when reduced to binary form problem. Classifier metrics viz. precision, recall, and F-score values show that the proposed system gives improved accuracy for Emo-DB.

[1]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[2]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[3]  Tiago H. Falk,et al.  Automatic speech emotion recognition using modulation spectral features , 2011, Speech Commun..

[4]  Santosh Chapaneri,et al.  Spoken Digits Recognition using Weighted MFCC and Improved Features for Dynamic Time Warping , 2012 .

[5]  Tim Polzehl,et al.  Anger recognition in speech using acoustic and linguistic cues , 2011, Speech Commun..

[6]  Fakhri Karray,et al.  Dimensionality Reduction for Emotional Speech Recognition , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[7]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[8]  Shaila D. Apte,et al.  Speech and Audio Processing , 2012 .

[9]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[10]  John H. L. Hansen,et al.  Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[11]  Björn Schuller,et al.  The Automatic Recognition of Emotions in Speech , 2011 .

[12]  Amy Ogan,et al.  Investigating the influence of virtual peers as dialect models on students' prosodic inventory , 2012, WOCCI.

[13]  Kuldip K. Paliwal,et al.  Speech Coding and Synthesis , 1995 .

[14]  Catherine Pelachaud,et al.  Emotion-Oriented Systems , 2011 .

[15]  Tsang-Long Pao,et al.  Detecting Emotions in Mandarin Speech , 2004, ROCLING/IJCLCLP.

[16]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[17]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[18]  Ronald W. Schafer,et al.  Introduction to Digital Speech Processing , 2007, Found. Trends Signal Process..

[19]  Diego H. Milone,et al.  Spoken emotion recognition using hierarchical classifiers , 2011, Comput. Speech Lang..

[20]  Sartra Wongthanavasu,et al.  Speech emotion recognition using Support Vector Machines , 2013, 2013 5th International Conference on Knowledge and Smart Technology (KST).

[21]  Yi-Ping Phoebe Chen,et al.  Acoustic feature selection for automatic emotion recognition from speech , 2009, Inf. Process. Manag..

[22]  E. C. Botha,et al.  On The Mel-scaled Cepstrum , 2000 .

[23]  Robert M. Gray,et al.  Speech coding based upon vector quantization , 1980, ICASSP.

[24]  Richard M. Stern,et al.  Delta-spectral cepstral coefficients for robust speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.