Feature selection in affective speech classification

The increasing role of spoken language interfaces in human-computer interaction applications has created conditions to facilitate a new area of research — namely recognizing the emotional state of the speaker through speech signals. This paper proposes a text independent method for emotion classification of speech signals used for the recognition of the emotional state of the speaker. Different feature selection criteria are explored and analyzed, namely Mutual Information Maximization (MIM) feature scoring criterion and its derivatives, to measure how potentially useful a feature or feature subset may be when used in a classifier. The proposed method employs different groups of low-level features, such as energy, zero-crossing rate, frequency bands in Mel scale, fundamental frequency or pitch, the delta- and delta-delta regression and statistical functions such as regression coefficients, extremums, moments etc., to represent the speech signals and a Neural Network classifier for the classification task. For the experiments the EMO-DB dataset is used with seven primary emotions including neutral. Results show that the proposed system yields an average accuracy of over 85% for recognizing 7 emotions with 5 of the best performing feature selection algorithms.

[2]  Wei Wu,et al.  Fusion of global statistical and segmental spectral features for speech emotion recognition , 2007, INTERSPEECH.

[3]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[4]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[5]  Chung-Hsien Wu,et al.  Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels , 2015, IEEE Transactions on Affective Computing.

[6]  Mihir Narayan Mohanty,et al.  Efficient feature combination techniques for emotional speech classification , 2016, International Journal of Speech Technology.

[7]  R. Plutchik The psychology and biology of emotion , 1994 .

[8]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[9]  Marie Tahon,et al.  Towards a Small Set of Robust Acoustic Features for Emotion Recognition: Challenges , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  Roddy Cowie,et al.  Automatic statistical analysis of the signal and prosodic signs of emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  Tiago H. Falk,et al.  Automatic speech emotion recognition using modulation spectral features , 2011, Speech Commun..

[12]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[13]  John H. L. Hansen,et al.  A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..

[14]  Björn W. Schuller,et al.  Categorical and dimensional affect analysis in continuous input: Current trends and future directions , 2013, Image Vis. Comput..

[15]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[16]  Fabien Ringeval,et al.  Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[17]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[18]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.