论文信息 - An optimal model with a lower bound of recall for imbalanced speech emotion recognition

An optimal model with a lower bound of recall for imbalanced speech emotion recognition

In an early complain warning system, we encounter a common problem - the lack of angry emotions for training classification models. Moreover, the recognition of angry emotion is more important than that of no-anger emotion. Based on this, the main purpose of this paper is to train an optimal model which achieves a high recall above a lower bound and a maximum of F 1 score. It is divided into three aspects: 1) A variant of F 1 score ( T F 1 score) takes recall above a lower bound and F 1 score into consideration; 2) A Single Emotion Deep Neural Network (SEDNN) and its training process are designed to find an optimal model with a maximum of T F 1 score. 3) A performance comparison of different methods is conducted on IEMOCAP and Emo-DB database. Extensive experiments show that when a BCE loss function or a focal loss function is used, the training process can find a model with a recall above a high threshold and a maximum of F 1 score. Especially, SEDNN with the focal loss function performs better than SEDNN with the BCE loss function.

Victor S. Sheng | Charles X. Ling | Wei Fang | Xusheng Ai

[1] François Chollet,et al. Keras: The Python Deep Learning library , 2018 .

[2] Che-Wei Huang,et al. Attention Assisted Discovery of Sub-Utterance Structure in Speech Emotion Recognition , 2016, INTERSPEECH.

[3] Hoon Sohn,et al. Novelty Detection Using Auto-Associative Neural Network , 2001, Dynamic Systems and Control.

[4] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[5] Wei-Chiang Hong,et al. Electric Load Forecasting by Hybrid Self-Recurrent Support Vector Regression Model With Variational Mode Decomposition and Improved Cuckoo Search Algorithm , 2020, IEEE Access.

[6] Zichen Zhang,et al. Electric load forecasting by complete ensemble empirical mode decomposition adaptive noise and support vector regression with quantum-based dragonfly algorithm , 2019, Nonlinear Dynamics.

[7] Jianfeng Zhao,et al. Speech emotion recognition using deep 1D & 2D CNN LSTM networks , 2019, Biomed. Signal Process. Control..

[8] George Trigeorgis,et al. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9] Mark J. F. Gales,et al. Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition , 2017, INTERSPEECH.

[10] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Yang Zhang,et al. Novel chaotic bat algorithm for forecasting complex motion of floating platforms , 2019, Applied Mathematical Modelling.

[12] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14] Jing Yang,et al. 3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition , 2018, IEEE Signal Processing Letters.

[15] Ning An,et al. Speech Emotion Recognition Using Fourier Parameters , 2015, IEEE Transactions on Affective Computing.

[16] David Masko,et al. The Impact of Imbalanced Training Data for Convolutional Neural Networks , 2015 .

[17] Jinkyu Lee,et al. High-level feature representation using recurrent neural network for speech emotion recognition , 2015, INTERSPEECH.

[18] Zichen Zhang,et al. A Hybrid Seasonal Mechanism with a Chaotic Cuckoo Search Algorithm with a Support Vector Regression Model for Electric Load Forecasting , 2018 .

[19] Carlos Busso,et al. IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[20] Dong Yu,et al. Speech emotion recognition using deep neural network and extreme learning machine , 2014, INTERSPEECH.

[21] Björn W. Schuller,et al. OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[22] Tao Li,et al. Structure-Measure: A New Way to Evaluate Foreground Maps , 2017, International Journal of Computer Vision.

[23] Che-Wei Huang,et al. Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[24] Seyedmahdad Mirsamadi,et al. Automatic speech emotion recognition using recurrent neural networks with local attention , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25] Harsh Sadawarti,et al. Hybrid Algorithm of Cuckoo Search and Particle Swarm Optimization for Natural Terrain Feature Extraction , 2015 .

[26] Atsuto Maki,et al. A systematic study of the class imbalance problem in convolutional neural networks , 2017, Neural Networks.

[27] Wendi B. Heinzelman,et al. Speech-based emotion classification using multiclass SVM with hybrid kernel and thresholding fusion , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).