Emotion recognition model based on the Dempster–Shafer evidence theory

Abstract. Automatic emotion recognition for video clips has become a popular area of research in recent years. Previous studies have explored emotion recognition methods through monomodal approaches, such as voice, text, facial expression, and physiological information. We focus on the complementarity of the information and construct an automatic emotion recognition model based on deep learning technology and multimodal fusion strategy. In this model, visual features, audio features, and text features are extracted from the video clips. A decision-level fusion strategy, based on the theory of evidence, is proposed to fuse the multiple classification results. To solve the problem of evidence conflict in evidence theory, we study a compatibility algorithm designed to correct conflicting evidence based on the similarity matrix of the evidence. This approach is shown to improve the accuracy of emotion recognition.

[1]  Frédéric Jurie,et al.  Temporal multimodal fusion for video emotion classification in the wild , 2017, ICMI.

[2]  Sridha Sridharan,et al.  Deep spatio-temporal feature fusion with compact bilinear pooling for multimodal emotion recognition , 2018, Comput. Vis. Image Underst..

[3]  Ping Hu,et al.  HoloNet: towards robust emotion recognition in the wild , 2016, ICMI.

[4]  S. Demleitner [Communication without words]. , 1997, Pflege aktuell.

[5]  Dong-Yan Huang,et al.  Audio-Visual Emotion Recognition with Capsule-like Feature Representation and Model-Based Reinforcement Learning , 2018, 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia).

[6]  Zhiyuan Li,et al.  Feature-Level and Model-Level Audiovisual Fusion for Emotion Recognition in the Wild , 2019, 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR).

[7]  M. Cabanac What is emotion? , 2002, Behavioural Processes.

[8]  Brandon G. King,et al.  Facial Features for Affective State Detection in Learning Environments , 2007 .

[9]  Christopher Joseph Pal,et al.  EmoNets: Multimodal deep learning approaches for emotion recognition in video , 2015, Journal on Multimodal User Interfaces.

[10]  Bo Sun,et al.  Multimodal Facial Expression Recognition Based on Dempster-Shafer Theory Fusion Strategy , 2018, 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia).

[11]  Ying Chen,et al.  Combining feature-level and decision-level fusion in a hierarchical classifier for emotion recognition in the wild , 2015, Journal on Multimodal User Interfaces.

[12]  Mohammad Rahmati,et al.  Driver drowsiness detection using face expression recognition , 2011, 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA).

[13]  Mohammad H. Mahoor,et al.  Nonverbal social withdrawal in depression: Evidence from manual and automatic analyses , 2014, Image Vis. Comput..

[14]  Yuanliu Liu,et al.  Video-based emotion recognition using CNN-RNN and C3D hybrid networks , 2016, ICMI.

[15]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[16]  Jian Yang,et al.  DSFD: Dual Shot Face Detector , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[18]  Shimon Whiteson,et al.  Towards Personalised Gaming via Facial Expression Recognition , 2014, AIIDE.

[19]  Jian Huang,et al.  Investigation of Multimodal Features, Classifiers and Fusion Methods for Emotion Recognition , 2018, ArXiv.

[20]  Ya Li,et al.  MEC 2017: Multimodal Emotion Recognition Challenge , 2018, 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia).

[21]  R. Plutchik A GENERAL PSYCHOEVOLUTIONARY THEORY OF EMOTION , 1980 .

[22]  Miriam Kunz,et al.  Understanding Facial Expressions of Pain in Patients With Depression. , 2017, The journal of pain : official journal of the American Pain Society.