Divide-and-Conquer based Ensemble to Spot Emotions in Speech using MFCC and Random Forest

Besides spoken words, speech signals also carry information about speaker gender, age, and emotional state which can be used in a variety of speech analysis applications. In this paper, a divide and conquer strategy for ensemble classification has been proposed to recognize emotions in speech. Intrinsic hierarchy in emotions has been utilized to construct an emotions tree, which assisted in breaking down the emotion recognition task into smaller sub tasks. The proposed framework generates predictions in three phases. Firstly, emotions are detected in the input speech signal by classifying it as neutral or emotional. If the speech is classified as emotional, then in the second phase, it is further classified into positive and negative classes. Finally, individual positive or negative emotions are identified based on the outcomes of the previous stages. Several experiments have been performed on a widely used benchmark dataset. The proposed method was able to achieve improved recognition rates as compared to several other approaches.

[1]  K. V. Krishna Kishore,et al.  Emotion recognition in speech using MFCC and wavelet features , 2013, 2013 3rd IEEE International Advance Computing Conference (IACC).

[2]  Elisabeth André,et al.  Improving Automatic Emotion Recognition from Speech via Gender Differentiaion , 2006, LREC.

[3]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[4]  Shikha Tripathi,et al.  Emotion Recognition through Speech Signal for Human-Computer Interaction , 2014, 2014 Fifth International Symposium on Electronic System Design.

[5]  P. Babu Anto,et al.  Speaker Independent Automatic Emotion Recognition from Speech: A Comparison of MFCCs and Discrete Wavelet Transforms , 2009, 2009 International Conference on Advances in Recent Technologies in Communication and Computing.

[6]  Zheng Fang,et al.  Comparison of different implementations of MFCC , 2001 .

[7]  张国亮,et al.  Comparison of Different Implementations of MFCC , 2001 .

[8]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[9]  Carlos Busso,et al.  Emotion recognition using a hierarchical binary decision tree approach , 2011, Speech Commun..

[10]  J. Russell,et al.  Facial and vocal expressions of emotion. , 2003, Annual review of psychology.

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[13]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[14]  J. Laird,et al.  Emotion-Specific Effects of Facial Expressions and Postures on Emotional Experience , 1989 .

[15]  Sung Wook Baik,et al.  Gender Identification using MFCC for Telephone Applications - A Comparative Study , 2016, ArXiv.

[16]  Sung Wook Baik,et al.  Dempster-Shafer Fusion Based Gender Recognition for Speech Analysis Applications , 2016, 2016 International Conference on Platform Technology and Service (PlatCon).