Emotion recognition on speech signals using machine learning

With the increase in man to machine interaction, speech analysis has become an integral part in reducing the gap between physical and digital world. An important subfield within this domain is the recognition of emotion in speech signals, which was traditionally studied in linguistics and psychology. Speech emotion recognition is a field having diverse applications. The prime objective of this paper is to recognize emotions in speech and classify them in 7 emotion output classes namely, anger, boredom, disgust, anxiety, happiness, sadness and neutral. The proposed approach is based upon the Mel Frequency Cepstral coefficients (MFCC) and energy of the speech signals as feature inputs and uses Berlin database of emotional speech. The features extracted from speech are converted into a feature vector, which in turn are used to train different classification algorithms namely, Support Vector Machine (SVM), Random Decision Forest and Gradient Boosting. Random forest was found to have the highest accuracy and predicted correct emotion 81.05% of the time.

[1]  Farah Chenchah,et al.  Speech emotion recognition in noisy environment , 2016, 2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP).

[2]  Yoon Keun Kwak,et al.  Speech Emotion Recognition Using Eigen-FFT in Clean and Noisy Environments , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[3]  Lijiang Chen,et al.  Multi-level Speech Emotion Recognition Based on HMM and ANN , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[4]  Jeong-Sik Park,et al.  Feature vector classification based speech emotion recognition for service robots , 2009, IEEE Transactions on Consumer Electronics.

[5]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[6]  Yixiong Pan,et al.  SPEECH EMOTION RECOGNITION USING SUPPORT VECTOR MACHINE , 2010 .

[7]  Marko Lugger,et al.  AN INCREMENTAL ANALYSIS OF DIFFERENT FEATURE GROUPS IN SPEAKER INDEPENDENT EMOTION RECOGNITION , 2007 .

[8]  Adil Alpkocak,et al.  Emotion Classification of Audio Signals Using Ensemble of Support Vector Machines , 2008, PIT.

[9]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Raja Noor Ainon,et al.  Speech emotion detection based on neural networks , 2007, 2007 9th International Symposium on Signal Processing and Its Applications.

[11]  Kah Phooi Seng,et al.  Audio-Emotion Recognition System Using Parallel Classifiers and Audio Feature Analyzer , 2011, 2011 Third International Conference on Computational Intelligence, Modelling & Simulation.

[12]  L.C. De Silva,et al.  Speech based emotion classification , 2001, Proceedings of IEEE Region 10 International Conference on Electrical and Electronic Technology. TENCON 2001 (Cat. No.01CH37239).

[13]  Panayiotis G. Georgiou,et al.  Real-time Emotion Detection System using Speech: Multi-modal Fusion of Different Timescale Features , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[14]  Mansour Sheikhan,et al.  Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network , 2011, Neural Computing and Applications.

[15]  Chun Chen,et al.  Audio-visual based emotion recognition - a new approach , 2004, CVPR 2004.

[16]  Alessandra Russo,et al.  Speech Emotion Classification Using Machine Learning Algorithms , 2008, 2008 IEEE International Conference on Semantic Computing.

[17]  K. V. Krishna Kishore,et al.  Emotion recognition in speech using MFCC and wavelet features , 2013, 2013 3rd IEEE International Advance Computing Conference (IACC).

[18]  Elisabeth André,et al.  Comparing Feature Sets for Acted and Spontaneous Speech in View of Automatic Emotion Recognition , 2005, 2005 IEEE International Conference on Multimedia and Expo.