A snapshot research and implementation of multimodal information fusion for data-driven emotion recognition

Abstract With the rapid development of artificial intelligence and mobile Internet, the new requirements for human-computer interaction have been put forward. The personalized emotional interaction service is a new trend in the human-computer interaction field. As a basis of emotional interaction, emotion recognition has also introduced many new advances with the development of artificial intelligence. The current research on emotion recognition mostly focuses on single-modal recognition such as expression recognition, speech recognition, limb recognition, and physiological signal recognition. However, the lack of the single-modal emotional information and vulnerability to various external factors lead to lower accuracy of emotion recognition. Therefore, multimodal information fusion for data-driven emotion recognition has been attracting the attention of researchers in the affective computing filed. This paper reviews the development background and hot spots of the data-driven multimodal emotion information fusion. Considering the real-time mental health monitoring system, the current development of multimodal emotion data sets, the multimodal features extraction, including the EEG, speech, expression, text features, and multimodal fusion strategies and recognition methods are discussed and summarized in detail. The main objective of this work is to present a clear explanation of the scientific problems and future research directions in the multimodal information fusion for data-driven emotion recognition field.

[1]  Seyedmahdad Mirsamadi,et al.  Automatic speech emotion recognition using recurrent neural networks with local attention , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Xiangmin Xu,et al.  EEG-based emotion classification using convolutional neural network , 2017, 2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC).

[3]  Wan Khairunizam,et al.  EEG-Based Emotion Assessment using Detrended Flunctuation Analysis (DFA) , 2018 .

[4]  Wen Gao,et al.  Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching , 2018, IEEE Transactions on Multimedia.

[5]  Zhong Yin,et al.  Recognition of emotions using multimodal physiological signals and an ensemble deep learning model , 2017, Comput. Methods Programs Biomed..

[6]  Nicu Sebe,et al.  A Quality Adaptive Multimodal Affect Recognition System for User-Centric Multimedia Indexing , 2016, ICMR.

[7]  Giancarlo Fortino,et al.  Human emotion recognition using deep belief network architecture , 2019, Inf. Fusion.

[8]  Hung T. Nguyen,et al.  EEG-based emotion classification using innovative features and combined SVM and HMM classifier , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[9]  Sung Wook Baik,et al.  Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network , 2017, 2017 International Conference on Platform Technology and Service (PlatCon).

[10]  George Trigeorgis,et al.  End-to-End Multimodal Emotion Recognition Using Deep Neural Networks , 2017, IEEE Journal of Selected Topics in Signal Processing.

[11]  Björn W. Schuller,et al.  End-to-End Speech Emotion Recognition Using Deep Neural Networks , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Hassan Ghasemzadeh,et al.  Multi-sensor fusion in body sensor networks: State-of-the-art and research challenges , 2017, Inf. Fusion.

[13]  Thomas S. Huang,et al.  Do Deep Neural Networks Learn Facial Action Units When Doing Expression Recognition? , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[14]  Tony Bartelme Meet Carl Marci: a doctor who wants to measure your emotions. , 2012, Physician executive.

[15]  Weihui Dai,et al.  Emotion recognition and affective computing on vocal social media , 2015, Inf. Manag..

[16]  Rama Chellappa,et al.  FaceNet2ExpNet: Regularizing a Deep Face Recognition Net for Expression Recognition , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[17]  Erik Cambria,et al.  Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph , 2018, ACL.

[18]  Erik Cambria,et al.  Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[19]  Wenming Zheng,et al.  EEG Emotion Recognition Using Dynamical Graph Convolutional Neural Networks , 2020, IEEE Transactions on Affective Computing.

[20]  Kaiqi Huang,et al.  Multi angle optimal pattern-based deep learning for automatic facial expression recognition , 2017, Pattern Recognit. Lett..

[21]  George Trigeorgis,et al.  Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Douglas D. O'Shaughnessy,et al.  Speech emotion recognition on mobile devices based on modulation spectral feature pooling and deep neural networks , 2017, 2017 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).

[23]  Samit Bhattacharya,et al.  Using Deep and Convolutional Neural Networks for Accurate Emotion Classification on DEAP Dataset , 2017, AAAI.

[24]  Shuhua Liu,et al.  Speech emotion recognition based on convolution neural network combined with random forest , 2018, 2018 Chinese Control And Decision Conference (CCDC).

[25]  Wei Li,et al.  A Dynamic Service Migration Mechanism in Edge Cognitive Computing , 2018, ACM Trans. Internet Techn..

[26]  Min Chen,et al.  Label-less Learning for Traffic Control in an Edge Network , 2018, IEEE Network.

[27]  Joyjit Chatterjee,et al.  Speech Emotion Recognition Using Cross-Correlation and Acoustic Features , 2018, 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech).

[28]  Junmo Kim,et al.  Joint Fine-Tuning in Deep Neural Networks for Facial Expression Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  T. Jung,et al.  Improving EEG-Based Emotion Classification Using Conditional Transfer Learning , 2017, Front. Hum. Neurosci..

[30]  Singapore,et al.  Emotion Classification from EEG Signals Using Time-Frequency-DWT Features and ANN , 2017 .

[31]  M. Shamim Hossain,et al.  Emotion recognition using deep learning approach from audio-visual emotional big data , 2019, Inf. Fusion.

[32]  Fabien Ringeval,et al.  AV+EC 2015: The First Affect Recognition Challenge Bridging Across Audio, Video, and Physiological Data , 2015, AVEC@ACM Multimedia.

[33]  Ping Lu,et al.  Audio-visual emotion fusion (AVEF): A deep efficient weighted approach , 2019, Inf. Fusion.

[34]  Beat Fasel,et al.  Automati Fa ial Expression Analysis: A Survey , 1999 .

[35]  Ivan Tashev,et al.  Convolutional Neural Network Techniques for Speech Emotion Recognition , 2018, 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC).

[36]  Najim Dehak,et al.  Deep Neural Networks for Emotion Recognition Combining Audio and Transcripts , 2018, INTERSPEECH.

[37]  Jiahui Pan,et al.  Fusion of Facial Expressions and EEG for Multimodal Emotion Recognition , 2017, Comput. Intell. Neurosci..

[38]  Bao-Liang Lu,et al.  Investigating Critical Frequency Bands and Channels for EEG-Based Emotion Recognition with Deep Neural Networks , 2015, IEEE Transactions on Autonomous Mental Development.

[39]  Sridha Sridharan,et al.  Deep spatio-temporal feature fusion with compact bilinear pooling for multimodal emotion recognition , 2018, Comput. Vis. Image Underst..

[40]  Joel J. P. C. Rodrigues,et al.  Postpartum depression prediction through pregnancy data analysis for emotion-aware smart systems , 2019, Inf. Fusion.

[41]  M. Shamim Hossain,et al.  Emotion-Aware Video QoE Assessment Via Transfer Learning , 2019, IEEE MultiMedia.

[42]  Xiao Ma,et al.  EARS: Emotion-aware recommender system based on hybrid information fusion , 2019, Inf. Fusion.

[43]  Stefan Winkler,et al.  Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning , 2015, ICMI.

[44]  I. Jolliffe Principal Component Analysis , 2005 .

[45]  Mohammad H. Mahoor,et al.  Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[46]  Ya Li,et al.  Long Short Term Memory Recurrent Neural Network based Multimodal Dimensional Emotion Recognition , 2015, AVEC@ACM Multimedia.

[47]  Zhong-Qiu Wang,et al.  Learning utterance-level representations for speech emotion and age/gender recognition using deep neural networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[48]  Xuewen Wu,et al.  Combining Multimodal Features within a Fusion Network for Emotion Recognition in the Wild , 2015, ICMI.

[49]  Jian Yang,et al.  Boosted Convolutional Neural Networks , 2016, BMVC.

[50]  Mohsen Guizani,et al.  Deep Features Learning for Medical Image Analysis with Convolutional Autoencoder Neural Network , 2017 .

[51]  Francisco Herrera,et al.  Cognitive Computing: Architecture, Technologies and Intelligent Applications , 2018, IEEE Access.

[52]  Chung-Hsien Wu,et al.  LSTM-based Text Emotion Recognition Using Semantic and Emotional Word Vectors , 2018, 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia).

[53]  Sergio Escalera,et al.  Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-Related Applications , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Dongmei Jiang,et al.  Sequence-to-sequence Modelling for Categorical Speech Emotion Recognition Using Recurrent Neural Network , 2018, 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia).

[55]  R. Cowie,et al.  A new emotion database: considerations, sources and scope , 2000 .

[56]  Cynthia Breazeal,et al.  Designing sociable robots , 2002 .

[57]  Roger Zimmermann,et al.  Self-Attentive Feature-Level Fusion for Multimodal Emotion Detection , 2018, 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR).

[58]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[59]  Min Chen,et al.  Wearable Affective Robot , 2018, IEEE Access.

[60]  Laurence Devillers,et al.  CNN+LSTM Architecture for Speech Emotion Recognition with Data Augmentation , 2018, Workshop on Speech, Music and Mind (SMM 2018).

[61]  Min Chen,et al.  Smart Clothing: Connecting Human with Clouds and Big Data for Sustainable Health Monitoring , 2016, Mobile Networks and Applications.

[62]  Erik Cambria,et al.  A review of affective computing: From unimodal analysis to multimodal fusion , 2017, Inf. Fusion.

[63]  Danyang Li,et al.  Ensemble of Deep Neural Networks with Probability-Based Fusion for Facial Expression Recognition , 2017, Cognitive Computation.

[64]  Haifeng Hu,et al.  Facial Expression Recognition Using Hierarchical Features With Deep Comprehensive Multipatches Aggregation Convolutional Neural Networks , 2019, IEEE Transactions on Multimedia.

[65]  Martin Buss,et al.  Feature Extraction and Selection for Emotion Recognition from EEG , 2014, IEEE Transactions on Affective Computing.

[66]  Giancarlo Fortino,et al.  A framework for collaborative computing and multi-sensor data fusion in body sensor networks , 2015, Inf. Fusion.

[67]  Khan M. Iftekharuddin,et al.  Sparse Simultaneous Recurrent Deep Learning for Robust Facial Expression Recognition , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[68]  Giancarlo Fortino,et al.  Decentralized Time-Synchronized Channel Swapping for Ad Hoc Wireless Networks , 2016, IEEE Transactions on Vehicular Technology.

[69]  Erik Cambria,et al.  Fusing audio, visual and textual clues for sentiment analysis from multimodal content , 2016, Neurocomputing.

[70]  Lijun Yin,et al.  Static and dynamic 3D facial expression recognition: A comprehensive survey , 2012, Image Vis. Comput..

[71]  Raffaele Gravina,et al.  Emotion-relevant activity recognition based on smart cushion using multi-sensor fusion , 2019, Inf. Fusion.

[72]  Roozbeh Jafari,et al.  Enabling Effective Programming and Flexible Management of Efficient Body Sensor Network Applications , 2013, IEEE Transactions on Human-Machine Systems.

[73]  Han Wen Review on Speech Emotion Recognition , 2014 .

[74]  Kamal Nasrollahi,et al.  Deep Pain: Exploiting Long Short-Term Memory Networks for Facial Expression Classification , 2017, IEEE Transactions on Cybernetics.

[75]  Bertram E. Shi,et al.  Action unit selective feature maps in deep networks for facial expression recognition , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[76]  Yuanliu Liu,et al.  Video-based emotion recognition using CNN-RNN and C3D hybrid networks , 2016, ICMI.

[77]  Zhong-Qiu Wang,et al.  Speech emotion recognition based on Gaussian Mixture Models and Deep Neural Networks , 2017, 2017 Information Theory and Applications Workshop (ITA).

[78]  Wootaek Lim,et al.  Speech emotion recognition using convolutional and Recurrent Neural Networks , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[79]  P. Malathi,et al.  Speaker dependent speech emotion recognition using MFCC and Support Vector Machine , 2016, 2016 International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT).

[80]  Cigdem Eroglu Erdem,et al.  BAUM-1: A Spontaneous Audio-Visual Face Database of Affective and Mental States , 2017, IEEE Transactions on Affective Computing.

[81]  Tong Zhang,et al.  A Novel Neural Network Model based on Cerebral Hemispheric Asymmetry for EEG Emotion Recognition , 2018, IJCAI.

[82]  Giancarlo Fortino,et al.  BodyCloud: A SaaS approach for community Body Sensor Networks , 2014, Future Gener. Comput. Syst..

[83]  Fabien Ringeval,et al.  Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[84]  Yong Du,et al.  Facial Expression Recognition Based on Deep Evolutional Spatial-Temporal Networks , 2017, IEEE Transactions on Image Processing.

[85]  Frédéric Jurie,et al.  Temporal multimodal fusion for video emotion classification in the wild , 2017, ICMI.

[86]  Cha Zhang,et al.  Image based Static Facial Expression Recognition with Multiple Deep Network Learning , 2015, ICMI.

[87]  Victor O. K. Li,et al.  Multi-Region Ensemble Convolutional Neural Network for Facial Expression Recognition , 2018, ICANN.

[88]  Xinlei Chen,et al.  QoE-Aware wireless video communications for emotion-aware intelligent systems: A multi-layered collaboration approach , 2019, Inf. Fusion.

[89]  Stefan Feuerriegel,et al.  Deep learning for affective computing: Text-based emotion recognition in decision support , 2018, Decis. Support Syst..

[90]  Soo-Young Lee,et al.  Hierarchical committee of deep convolutional neural networks for robust facial expression recognition , 2016, Journal on Multimodal User Interfaces.

[91]  Kristof Van Laerhoven,et al.  Introducing WESAD, a Multimodal Dataset for Wearable Stress and Affect Detection , 2018, ICMI.

[92]  Dong Yu,et al.  Speech emotion recognition using deep neural network and extreme learning machine , 2014, INTERSPEECH.

[93]  Ping Liu,et al.  Facial Expression Recognition via a Boosted Deep Belief Network , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[94]  Maja Pantic,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING , 2022 .

[95]  Kai Tang,et al.  Kernel fusion based extreme learning machine for cross-location activity recognition , 2017, Inf. Fusion.

[96]  Kostas Karpouzis,et al.  The HUMAINE Database: Addressing the Collection and Annotation of Naturalistic and Induced Emotional Data , 2007, ACII.

[97]  Ioannis Pitas,et al.  The eNTERFACE’05 Audio-Visual Emotion Database , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[98]  Liming Chen,et al.  Accurate Facial Parts Localization and Deep Learning for 3D Facial Expression Recognition , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[99]  Tamás D. Gedeon,et al.  Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012, IEEE MultiMedia.

[100]  E. Oja,et al.  Independent Component Analysis , 2013 .

[101]  Stefan Fischer,et al.  Fusion of audio and video information for multi modal person authentication , 1997, Pattern Recognit. Lett..

[102]  Jithendra Vepa,et al.  Speech Emotion Recognition Using Spectrogram & Phoneme Embedding , 2018, INTERSPEECH.

[103]  Sidney K. D'Mello,et al.  A Review and Meta-Analysis of Multimodal Affect Detection Systems , 2015, ACM Comput. Surv..

[104]  Hassan Ghasemzadeh,et al.  Power-Aware Activity Monitoring Using Distributed Wearable Sensors , 2014, IEEE Transactions on Human-Machine Systems.

[105]  Ahmad R. Sharafat,et al.  Reliable emotion recognition system based on dynamic adaptive fusion of forehead biopotentials and physiological signals , 2015, Comput. Methods Programs Biomed..

[106]  Cuntai Guan,et al.  Filter Bank Common Spatial Pattern (FBCSP) in Brain-Computer Interface , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[107]  Diana Inkpen,et al.  Hierarchical Approach to Emotion Recognition and Classification in Texts , 2010, Canadian Conference on AI.

[108]  T. Dalgleish The emotional brain , 2004, Nature Reviews Neuroscience.

[109]  Muhammad Ghulam,et al.  User emotion recognition from a larger pool of social network data using active learning , 2017, Multimedia Tools and Applications.

[110]  Neha Jain,et al.  Hybrid deep neural networks for face emotion recognition , 2018, Pattern Recognit. Lett..

[111]  Junmo Kim,et al.  Deep generative-contrastive networks for facial expression recognition , 2017, ArXiv.

[112]  Mohamed Chetouani,et al.  EMOEEG: A new multimodal dataset for dynamic EEG-based emotion recognition with audiovisual elicitation , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[113]  Yong Peng,et al.  EEG-based emotion classification using deep belief networks , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[114]  Wen Gao,et al.  Learning Affective Features With a Hybrid Deep Model for Audio–Visual Emotion Recognition , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[115]  Subramanian Ramanathan,et al.  DECAF: MEG-Based Multimodal Database for Decoding Affective Physiological Responses , 2015, IEEE Transactions on Affective Computing.

[116]  Jian Sun,et al.  Multimodal 2D+3D Facial Expression Recognition With Deep Fusion Convolutional Neural Network , 2017, IEEE Transactions on Multimedia.

[117]  Yafeng Niu,et al.  Improvement on Speech Emotion Recognition Based on Deep Convolutional Neural Networks , 2018, ICCAI 2018.

[118]  Chengxin Li,et al.  Speech emotion recognition with acoustic and lexical features , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).