Weighted spectral features based on local Hu moments for speech emotion recognition

Abstract Features greatly influence the results of speech emotion recognition, among which Mel-frequency Cepstral Coefficients (MFCC) is the most commonly used in speech emotion. However, MFCC does not consider both the relationship among neighbor coefficients of Mel filters of a frame and the relationship among coefficients of Mel filters of neighbor frames, which possibly leads to lose many useful features from spectrogram. This paper presents novel weighted spectral features based on Local Hu moments. The idea is motivated by that the energy on spectrogram would drastically vary with some emotion types such as angry and happy, while it would slightly change with other emotion types such as sadness and fear. This phenomenon would affect the local energy distribution of spectrogram in both time axis and frequency axis of spectrogram. To describe local energy distribution of spectrogram, Hu moments computed from local regions of spectrogram are used, as Hu moments can evaluate the degree how the energy is concentrated to the center of energy gravity of local region of spectrogram and can significantly vary with the speech emotion types. The conducted experiments validate the proposed features in terms of the effectiveness of the speech emotion recognition.

[1]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[2]  Yoshihiko Hamamoto,et al.  A local mean-based nonparametric classifier , 2006, Pattern Recognit. Lett..

[3]  Nikos Fakotakis,et al.  Modeling the Temporal Evolution of Acoustic Parameters for Speech Emotion Recognition , 2012, IEEE Transactions on Affective Computing.

[4]  S. Tamil Selvi,et al.  Class-specific multiple classifiers scheme to recognize emotions from speech signals , 2014, Comput. Speech Lang..

[5]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[6]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech: a review , 2012, International Journal of Speech Technology.

[7]  A. Tanju Erdem,et al.  Formant position based weighted spectral features for emotion recognition , 2011, Speech Commun..

[8]  Chung-Hsien Wu,et al.  Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels , 2015, IEEE Transactions on Affective Computing.

[9]  Bin Yang,et al.  Emotion recognition from speech signals using new harmony features , 2010, Signal Process..

[10]  Carlos A. Reyes-García,et al.  Acoustic feature selection and classification of emotions in speech using a 3D continuous emotion model , 2012 .

[11]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Emmanuel Dellandréa,et al.  Recognition of emotions in speech by a hierarchical approach , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[13]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[14]  Lei Zhang,et al.  Application of improved HU moments in object recognition , 2012, 2012 IEEE International Conference on Automation and Logistics.

[15]  Björn W. Schuller,et al.  Learning with synthesized speech for automatic emotion recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Lukás Burget,et al.  Application of speaker- and language identification state-of-the-art techniques for emotion recognition , 2011, Speech Commun..

[17]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[18]  Tsang-Long Pao,et al.  Segment-based emotion recognition from continuous Mandarin Chinese speech , 2011, Comput. Hum. Behav..

[19]  Soo-Young Lee,et al.  Feature extraction based on zero-crossings with peak amplitudes for robust speech recognition in noisy environments , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[20]  Ramón López-Cózar,et al.  Enhancement of emotion detection in spoken dialogue systems by combining several information sources , 2011, Speech Commun..

[21]  Jeong-Sik Park,et al.  Feature vector classification based speech emotion recognition for service robots , 2009, IEEE Transactions on Consumer Electronics.

[22]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[23]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[24]  Felix Burkhardt,et al.  "You Seem Aggressive!" Monitoring Anger in a Practical Application , 2012, LREC.

[25]  Say Wei Foo,et al.  Classification of stress in speech using linear and nonlinear features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[26]  Björn Schuller,et al.  The Automatic Recognition of Emotions in Speech , 2011 .

[27]  Jinsong Leng,et al.  Analysis of Hu's moment invariants on image scaling and rotation , 2010, 2010 2nd International Conference on Computer Engineering and Technology.

[28]  Robert I. Damper,et al.  Classification of emotional speech using 3DEC hierarchical classifier , 2012, Speech Commun..

[29]  Robert I. Damper,et al.  On Acoustic Emotion Recognition: Compensating for Covariate Shift , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Werner Verhelst,et al.  An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech , 2007, Speech Commun..

[31]  João Paulo Papa,et al.  Spoken emotion recognition through optimum-path forest classification using glottal features , 2010, Comput. Speech Lang..

[32]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[33]  Ragini Verma,et al.  Class-level spectral features for emotion recognition , 2010, Speech Commun..

[34]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech using source, system, and prosodic features , 2012, Int. J. Speech Technol..

[35]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[36]  Elmar Nöth,et al.  The INTERSPEECH 2012 Speaker Trait Challenge , 2012, INTERSPEECH.

[37]  Tiago H. Falk,et al.  Automatic speech emotion recognition using modulation spectral features , 2011, Speech Commun..

[38]  Gianluca Bontempi,et al.  On the Use of Variable Complementarity for Feature Selection in Cancer Classification , 2006, EvoWorkshops.

[39]  Inma Hernáez,et al.  Feature Analysis and Evaluation for Automatic Emotion Identification in Speech , 2010, IEEE Transactions on Multimedia.

[40]  Carlos Busso,et al.  Emotion recognition using a hierarchical binary decision tree approach , 2011, Speech Commun..

[41]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[42]  Andreas Wendemuth,et al.  Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications , 2014, Comput. Speech Lang..

[43]  Diego H. Milone,et al.  Spoken emotion recognition using hierarchical classifiers , 2011, Comput. Speech Lang..

[44]  Ling He,et al.  Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech , 2011, Biomed. Signal Process. Control..

[45]  Pierre Dumouchel,et al.  Anchor Models for Emotion Recognition from Speech , 2013, IEEE Transactions on Affective Computing.

[46]  Philip J. B. Jackson,et al.  Speaker-dependent audio-visual emotion recognition , 2009, AVSP.

[47]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[48]  Yoon Keun Kwak,et al.  Improved Emotion Recognition With a Novel Speaker-Independent Feature , 2009, IEEE/ASME Transactions on Mechatronics.

[49]  Say Wei Foo,et al.  Speech emotion recognition using hidden Markov models , 2003, Speech Commun..

[50]  Lijiang Chen,et al.  Speech emotion recognition: Features and classification models , 2012, Digit. Signal Process..

[51]  Ming-Kuei Hu,et al.  Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[52]  Tsang-Long Pao,et al.  Detecting Emotions in Mandarin Speech , 2004, ROCLING/IJCLCLP.

[53]  Dragiša Unić,et al.  Shape ellipticity from Hu moment invariants , 2014 .