Deep emotion recognition using prosodic and spectral feature extraction and classification based on cross validation and bootstrap

Despite the existence of a robust model to identify basic emotions, the ability to classify a large group of emotions with reliability is yet to be developed. Hence, objective of this paper is to develop an efficient technique to identify emotions with an accuracy comparable to humans. The array of emotions addressed in this paper go far beyond what are present on the circumflex diagram. Due to the nature of correlation and ambiguity present in emotions, both prosodic and spectral features of speech are considered during the feature extraction. Feature selection algorithms are applied to work on a subset of relevant features. Owing to the low dimensionality of the feature space, several cross validation methods are employed in combination with different classifiers and their performances are compared. In addition to cross validation, the bootstrap error estimate is also calculated and a combination of both is used as an overall estimate of the classification accuracy of the model.

[1]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[2]  Alessandra Russo,et al.  Speech Emotion Classification Using Machine Learning Algorithms , 2008, 2008 IEEE International Conference on Semantic Computing.

[3]  Keshi Dai,et al.  Recognizing emotion in speech using neural networks , 2008 .

[4]  R. Frick,et al.  The prosodic expression of anger: Differentiating threat and frustration , 1986 .

[5]  Hela Daassi-Gnaba,et al.  Enhanced Emotion Recognition by Feature Selection to Animate a Talking Head , 2012, ESANN.

[6]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[7]  Guoyin Wang,et al.  Speech Emotion Recognition Based on Rough Set and SVM , 2006, 2006 5th IEEE International Conference on Cognitive Informatics.

[8]  Giovanni Costantini,et al.  Speech emotion recognition using amplitude modulation parameters and a combined feature selection procedure , 2014, Knowl. Based Syst..

[9]  Ragini Verma,et al.  Class-level spectral features for emotion recognition , 2010, Speech Commun..

[10]  Dimitrios Ververidis,et al.  A State of the Art Review on Emotional Speech Databases , 2003 .

[11]  Fakhri Karray,et al.  Speech Emotion Recognition using Gaussian Mixture Vector Autoregressive Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[12]  Julia Hirschberg,et al.  Classifying subject ratings of emotional speech using acoustic features , 2003, INTERSPEECH.

[13]  Chun Chen,et al.  A Hierarchical Framework for Speech Emotion Recognition , 2006, 2006 IEEE International Symposium on Industrial Electronics.

[14]  Steven J. Simske,et al.  Recognition of emotions in interactive voice response systems , 2003, INTERSPEECH.

[15]  Chaitali Chakrabarti,et al.  A speech emotion recognition framework based on latent Dirichlet allocation: Algorithm and FPGA implementation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Andreas Stolcke,et al.  Combining Prosodic Lexical and Cepstral Systems for Deceptive Speech Detection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[17]  M. Borchert,et al.  Emotions in speech - experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.