Ensemble Learning of Hybrid Acoustic Features for Speech Emotion Recognition

Automatic recognition of emotion is important for facilitating seamless interactivity between a human being and intelligent robot towards the full realization of a smart society. The methods of signal processing and machine learning are widely applied to recognize human emotions based on features extracted from facial images, video files or speech signals. However, these features were not able to recognize the fear emotion with the same level of precision as other emotions. The authors propose the agglutination of prosodic and spectral features from a group of carefully selected features to realize hybrid acoustic features for improving the task of emotion recognition. Experiments were performed to test the effectiveness of the proposed features extracted from speech files of two public databases and used to train five popular ensemble learning algorithms. Results show that random decision forest ensemble learning of the proposed hybrid acoustic features is highly effective for speech emotion recognition.

[1]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[2]  Damjan Vlaj,et al.  Acoustic classification and segmentation using modified spectral roll-off and variance-based features , 2013, Digit. Signal Process..

[3]  N. P. Narendra,et al.  Dysarthric speech classification from coded telephone speech using glottal features , 2019, Speech Commun..

[4]  Yanli Wu,et al.  Application of alternating decision tree with AdaBoost and bagging ensembles for landslide susceptibility mapping , 2020 .

[5]  S. R. Livingstone,et al.  The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English , 2018, PloS one.

[6]  Yee Mey Goh,et al.  An ensemble based on neural networks with random weights for online data stream regression , 2020, Soft Comput..

[7]  Fu Lee Wang,et al.  Speech emotion recognition based on DNN-decision tree SVM model , 2019, Speech Commun..

[8]  Min Wu,et al.  Speech emotion recognition based on an improved brain emotion learning model , 2018, Neurocomputing.

[9]  Ragini Verma,et al.  Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech , 2015, Comput. Speech Lang..

[10]  Enzo Pasquale Scilingo,et al.  Analysis of speech features and personality traits , 2019, Biomed. Signal Process. Control..

[11]  Sun Ying,et al.  Characteristics of human auditory model based on compensation of glottal features in speech emotion recognition , 2018, Future Gener. Comput. Syst..

[12]  Perez,et al.  Color–Texture Pattern Classification Using Global–Local Feature Extraction, an SVM Classifier, with Bagging Ensemble Post-Processing , 2019, Applied Sciences.

[13]  Yongzhao Zhan,et al.  Learning emotion-discriminative and domain-invariant features for domain adaptation in speech emotion recognition , 2017, Speech Commun..

[14]  Sazali Yaacob,et al.  Improved Emotion Recognition Using Gaussian Mixture Model and Extreme Learning Machine in Speech and Glottal Signals , 2015 .

[15]  Prema Nedungadi,et al.  Hybrid Approach for Emotion Classification of Audio Conversation Based on Text and Speech Mining , 2015 .

[16]  Jesús B. Alonso,et al.  New approach in quantification of emotional intensity from the speech signal: emotional temperature , 2015, Expert Syst. Appl..

[17]  Carlos A. Reyes-García,et al.  Acoustic feature selection and classification of emotions in speech using a 3D continuous emotion model , 2012 .

[18]  Zhiwen Yu,et al.  A survey on ensemble learning , 2019, Frontiers of Computer Science.

[19]  François Pachet,et al.  Analytical Features: A Knowledge-Based Approach to Audio Feature Generation , 2009, EURASIP J. Audio Speech Music. Process..

[20]  Hong-Jie Xing,et al.  Robust AdaBoost based ensemble of one-class support vector machines , 2020, Inf. Fusion.

[21]  Rajiv Ratn Shah,et al.  Bagged support vector machines for emotion recognition from speech , 2019, Knowl. Based Syst..

[22]  Ursula Hess,et al.  Darwin and emotion expression. , 2009, The American psychologist.

[23]  Halis Altun,et al.  Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection , 2009, Expert Syst. Appl..

[24]  Tamim Ahmed Khan,et al.  Emotion Recognition from Speech using Prosodic and Linguistic Features , 2016 .

[25]  Stephen McAdams,et al.  A Comparison of Approaches to Timbre Descriptors in Music Information Retrieval and Music Psychology , 2016 .

[26]  Abdulhamit Subasi,et al.  Comparison of Bagging and Boosting Ensemble Machine Learning Methods for Automated EMG Signal Classification , 2019, BioMed research international.

[27]  Ning An,et al.  Speech Emotion Recognition Using Fourier Parameters , 2015, IEEE Transactions on Affective Computing.

[28]  Basilio Sierra,et al.  Classifier Subset Selection for the Stacked Generalization Method Applied to Emotion Recognition in Speech , 2015, Sensors.

[29]  Lijiang Chen,et al.  Speech emotion recognition: Features and classification models , 2012, Digit. Signal Process..

[30]  R. Subhashini,et al.  Analyzing and Detecting Employee's Emotion for Amelioration of Organizations , 2015 .

[31]  Ping Lu,et al.  Audio-visual emotion fusion (AVEF): A deep efficient weighted approach , 2019, Inf. Fusion.

[32]  Alex Pappachen James,et al.  Heart rate monitoring using human speech spectral features , 2015, Human-centric Computing and Information Sciences.

[33]  R. Feinberg,et al.  Operational determinants of caller satisfaction in the banking/financial services call center , 2002 .

[34]  V. Tampakas,et al.  Improving the evaluation process of students’ performance utilizing a decision support software , 2018, Neural Computing and Applications.

[35]  Chih-Fong Tsai,et al.  SVM and SVM Ensembles in Breast Cancer Prediction , 2017, PloS one.

[36]  Jun-Wei Mao,et al.  Speech emotion recognition based on feature selection and extreme learning machine decision tree , 2018, Neurocomputing.

[37]  Zheng Wang,et al.  Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network , 2019, Sensors.

[38]  Rosalind W. Picard Affective computing: challenges , 2003, Int. J. Hum. Comput. Stud..

[39]  Sunil Kumar Kopparapu,et al.  Knowledge-based Framework for Intelligent Emotion Recognition in Spontaneous Speech , 2016, KES.

[40]  Inma Hernáez,et al.  Feature Analysis and Evaluation for Automatic Emotion Identification in Speech , 2010, IEEE Transactions on Multimedia.

[41]  Turgut Özseven,et al.  The Acoustic Cues of Fear: Investigation of Acoustic Parameters of Speech Containing Fear , 2018, Archives of Acoustics.

[42]  Marco Iacoboni,et al.  Embodied Listening and Timbre: Perceptual, Acoustical, and Neural Correlates , 2018 .

[43]  Zhenyu Liu,et al.  Investigation of different speech types and emotions for detecting depression using different classifiers , 2017, Speech Commun..

[44]  B AlonsoJesús,et al.  New approach in quantification of emotional intensity from the speech signal , 2015 .

[45]  Guihua Wen,et al.  Weighted spectral features based on local Hu moments for speech emotion recognition , 2015, Biomed. Signal Process. Control..

[46]  Mahmoud Al-Ayyoub,et al.  Recognizing Emotion from Speech Based on Age and Gender Using Hierarchical Models , 2019, ANT/EDI40.

[47]  Lior Rokach,et al.  Ensemble learning: A survey , 2018, WIREs Data Mining Knowl. Discov..

[48]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[49]  Emanuele Pollastri,et al.  Musical Instrument Timbres Classification with Spectral Features , 2003, EURASIP J. Adv. Signal Process..

[50]  M. V. P. Chandra Sekhara Rao,et al.  An integrated approach to emotion recognition and gender classification , 2019, J. Vis. Commun. Image Represent..

[51]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[52]  Eduardo Coutinho,et al.  Cooperative Learning and its Application to Emotion Recognition from Speech , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[53]  Daniela Sammler,et al.  Prosody conveys speaker's intentions: Acoustic cues for speech act perception , 2014 .

[54]  Basilio Sierra,et al.  Feature Selection for Speech Emotion Recognition in Spanish and Basque: On the Use of Machine Learning to Improve Human-Computer Interaction , 2014, PloS one.

[55]  Oludayo O. Olugbara,et al.  Segmentation of Melanoma Skin Lesion Using Perceptual Color Difference Saliency with Morphological Analysis , 2018 .

[56]  Carlos Busso,et al.  Shape-based modeling of the fundamental frequency contour for emotion detection in speech , 2014, Comput. Speech Lang..

[57]  Weishan Zhang,et al.  Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN , 2017, Sensors.

[58]  Merlin Teodosia Suarez,et al.  Analysis of Music Timbre Features for the Construction of User-Specific Affect Model , 2012 .

[59]  Fillia Makedon,et al.  Deep Visual Attributes vs. Hand-Crafted Audio Features on Multidomain Speech Emotion Recognition , 2017, Comput..

[60]  Mohamed Mbarki,et al.  Automatic speech emotion recognition using an optimal combination of features based on EMD-TKEO , 2019, Speech Commun..

[61]  Masato Akagi,et al.  Improving multilingual speech emotion recognition by combining acoustic features in a three-layer model , 2019, Speech Commun..

[62]  Bai Jianchuan,et al.  Noisy speech emotion recognition using sample reconstruction and multiple-kernel learning , 2017 .

[63]  Kwoting Fang,et al.  Measuring the Post-Adoption Customer Perception of Mobile Banking Services , 2009, Cyberpsychology Behav. Soc. Netw..

[64]  Richard Millham,et al.  Experimentation using short-term spectral features for secure mobile internet voting authentication , 2015 .