论文信息 - An improved characterization methodology to efficiently deal with the speech emotion recognition problem

An improved characterization methodology to efficiently deal with the speech emotion recognition problem

The speaker emotional state recognition task in human-computer interaction will be one of the most common in the future. This task is known as Speech Emotion Recognition (SER). Previous works have developed some characterizations which heavily relies on some sort of feature selection method in order to choose the best subset of features. To our knowledge, no effort has been invested in working out the original features with the idea to improve the classification. In this work, a methodology for feature preprocessing is presented. To this end, our characterization method uses a speech signal from which different characteristics, as well as statistics, are extracted. Then, these characteristics go through a preprocessing phase which will enhance the classification efficiency. After this, a two-stage classification scheme is used. In the first stage k-Means is used for clustering and then in the second stage, we use several standard classifiers. This strategy shows consistently across the classifiers, except for SVM, a superior classification rate (91–100%) than those reported in previous works.

Jaime Cerda Jacobo | Bryan E. Martínez

[1] Greg Ridgeway,et al. Generalized Boosted Models: A guide to the gbm package , 2006 .

[2] Jie Li,et al. Speech Endpoint Detection Method Based on TEO in Noisy Environment , 2012 .

[3] Sung Wook Baik,et al. Divide-and-Conquer based Ensemble to Spot Emotions in Speech using MFCC and Random Forest , 2016, ArXiv.

[4] K. YogeshC.,et al. Bispectral features and mean shift clustering for stress and emotion recognition from natural speech , 2017, Comput. Electr. Eng..

[5] Dimitrios Ververidis,et al. A State of the Art Review on Emotional Speech Databases , 2003 .

[6] Eric O. Postma,et al. Speech Emotion Recognition with Log-Gabor Filters , 2016, ICAART.

[7] S. Tamil Selvi,et al. Class-specific multiple classifiers scheme to recognize emotions from speech signals , 2014, Comput. Speech Lang..

[8] Fakhri Karray,et al. Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[9] LaVar King Isaacson,et al. Spectral Entropy, Empirical Entropy and Empirical Exergy for Deterministic Boundary-Layer Structures , 2013, Entropy.

[10] David J. Fleet,et al. Efficient Non-greedy Optimization of Decision Trees , 2015, NIPS.

[11] R. V. Darekar,et al. Toward Improved Performance of Emotion Detection: Multimodal Approach , 2017 .

[12] Lijiang Chen,et al. Speech emotion recognition: Features and classification models , 2012, Digit. Signal Process..

[13] Weishan Zhang,et al. Deep learning and SVM‐based emotion recognition from Chinese speech for smart affective services , 2017, Softw. Pract. Exp..

[14] K. YogeshC.,et al. A new hybrid PSO assisted biogeography-based optimization for emotion and stress recognition from speech signal , 2017, Expert Syst. Appl..

[15] Trevor Hastie,et al. Multi-class AdaBoost ∗ , 2009 .

[16] Ioannis Pitas,et al. Automatic emotional speech classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17] Wesley Mattheyses,et al. ROBUST PITCH MARKING FOR PROSODIC MODIFICATION OF SPEECH USING TD-PSOLA , 2006 .

[18] Saudi Arabia,et al. A High Resolution Pitch Detection Algorithm Based on AMDF and ACF , 2009 .

[19] Artur Janicki,et al. Comparison of speaker dependent and speaker independent emotion recognition , 2013, Int. J. Appl. Math. Comput. Sci..

[20] Sazali Yaacob,et al. Improved Emotion Recognition Using Gaussian Mixture Model and Extreme Learning Machine in Speech and Glottal Signals , 2015 .

[21] David Gerhard,et al. Pitch Extraction and Fundamental Frequency: History and Current Techniques , 2003 .

[22] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[23] Pierre Geurts,et al. Extremely randomized trees , 2006, Machine Learning.

[24] Ibrahiem M. M. El Emary,et al. Speech emotion recognition approaches in human computer interaction , 2013, Telecommun. Syst..