A Hybrid Latent Space Data Fusion Method for Multimodal Emotion Recognition

Multimodal emotion recognition is an emerging interdisciplinary field of research in the area of affective computing and sentiment analysis. It aims at exploiting the information carried by signals of different nature to make emotion recognition systems more accurate. This is achieved by employing a powerful multimodal fusion method. In this study, a hybrid multimodal data fusion method is proposed in which the audio and visual modalities are fused using a latent space linear map and then, their projected features into the cross-modal space are fused with the textual modality using a Dempster-Shafer (DS) theory-based evidential fusion method. The evaluation of the proposed method on the videos of the DEAP dataset shows its superiority over both decision-level and non-latent space fusion methods. Furthermore, the results reveal that employing Marginal Fisher Analysis (MFA) for feature-level audio-visual fusion results in higher improvement in comparison to cross-modal factor analysis (CFA) and canonical correlation analysis (CCA). Also, the implementation results show that exploiting textual users’ comments with the audiovisual content of movies improves the performance of the system.

[1]  A. Murat Tekalp,et al.  Audiovisual Synchronization and Fusion Using Canonical Correlation Analysis , 2007, IEEE Transactions on Multimedia.

[2]  Ahmad Reza Naghsh-Nilchi,et al.  Incorporating social media comments in affective video retrieval , 2016, J. Inf. Sci..

[3]  Yaser Sheikh,et al.  On the use of computable features for film classification , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Mohammad Ehsan Basiri,et al.  Sentence-level sentiment analysis in Persian , 2017, 2017 3rd International Conference on Pattern Recognition and Image Analysis (IPRIA).

[5]  Nasser Ghasem-Aghaee,et al.  Exploiting reviewers’ comment histories for sentiment analysis , 2014, J. Inf. Sci..

[6]  Wioleta Szwoch,et al.  Emotion Recognition for Affect Aware Video Games , 2014, IP&C.

[7]  Mohammad Ehsan Basiri,et al.  Particle Swarm Optimization for Feature Selection in Speaker Verification , 2010, EvoApplications.

[8]  Vince D. Calhoun,et al.  Canonical Correlation Analysis for Data Fusion and Group Inferences , 2010, IEEE Signal Processing Magazine.

[9]  Ling Guan,et al.  Kernel Cross-Modal Factor Analysis for Information Fusion With Application to Bimodal Emotion Recognition , 2012, IEEE Transactions on Multimedia.

[10]  Qiang Ji,et al.  Hybrid video emotional tagging using users’ EEG and video content , 2014, Multimedia Tools and Applications.

[11]  T. Jung,et al.  Fusion of electroencephalographic dynamics and musical contents for estimating emotional responses in music listening , 2014, Front. Neurosci..

[12]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[13]  Li-Minn Ang,et al.  Video Analytics for Customer Emotion and Satisfaction at Contact Centers , 2018, IEEE Transactions on Human-Machine Systems.

[14]  Touradj Ebrahimi,et al.  Multimedia content analysis for emotional characterization of music video clips , 2013, EURASIP J. Image Video Process..

[15]  Bao-Liang Lu,et al.  Multimodal emotion recognition using EEG and eye tracking data , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[16]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Erik Cambria,et al.  A review of affective computing: From unimodal analysis to multimodal fusion , 2017, Inf. Fusion.

[18]  Haibo Li,et al.  Sparse Kernel Reduced-Rank Regression for Bimodal Emotion Recognition From Facial Expression and Speech , 2016, IEEE Transactions on Multimedia.

[19]  Erik Cambria,et al.  Affective Computing and Sentiment Analysis , 2016, IEEE Intelligent Systems.

[20]  Ishwar K. Sethi,et al.  Multimedia content processing through cross-modal association , 2003, MULTIMEDIA '03.

[21]  Vladimir Pavlovic,et al.  Dynamic Probabilistic CCA for Analysis of Affective Behavior and Fusion of Continuous Annotations , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Ahmad Reza Naghsh-Nilchi,et al.  An evidential data fusion method for affective music video retrieval , 2017, Intell. Data Anal..

[23]  Shahla Nemati,et al.  Canonical Correlation Analysis for Data Fusion in Multimodal Emotion Recognition , 2018, 2018 9th International Symposium on Telecommunications (IST).

[24]  Hang-Bong Kang,et al.  Affective content detection using HMMs , 2003, ACM Multimedia.

[25]  Reza Boostani,et al.  FF-SKPCCA: Kernel probabilistic canonical correlation analysis , 2017, Applied Intelligence.

[26]  Yue-Kai Huang,et al.  Visual/Acoustic Emotion Recognition , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[27]  Alla Anohina-Naumeca,et al.  Emotion Recognition in Affective Tutoring Systems , 2017 .

[28]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[29]  Touradj Ebrahimi,et al.  Affective content analysis of music video clips , 2011, MIRUM '11.

[30]  Stephen Lin,et al.  Marginal Fisher Analysis and Its Variants for Human Gait Recognition and Content- Based Image Retrieval , 2007, IEEE Transactions on Image Processing.

[31]  Mohammad Ehsan Basiri,et al.  Words Are Important , 2018, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[32]  Nasser Ghasem-Aghaee,et al.  Sentiment Prediction Based on Dempster-Shafer Theory of Evidence , 2014 .

[33]  Mohammad Soleymani,et al.  Continuous emotion detection using EEG signals and facial expressions , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[34]  Nicu Sebe,et al.  Emotion Recognition Based on Joint Visual and Audio Cues , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[35]  Mohammad Ehsan Basiri,et al.  HOMPer: A new hybrid system for opinion mining in the Persian language , 2019, J. Inf. Sci..

[36]  Rafael A. Calvo,et al.  Affect Detection: An Interdisciplinary Review of Models, Methods, and Their Applications , 2010, IEEE Transactions on Affective Computing.

[37]  Sidney K. D'Mello,et al.  A Review and Meta-Analysis of Multimodal Affect Detection Systems , 2015, ACM Comput. Surv..

[38]  Ling Guan,et al.  Recognizing Human Emotional State From Audiovisual Signals , 2008, IEEE Transactions on Multimedia.

[39]  Yu Zheng,et al.  Methodologies for Cross-Domain Data Fusion: An Overview , 2015, IEEE Transactions on Big Data.

[40]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[41]  Mohammad Ehsan Basiri,et al.  Uninorm operators for sentence-level score aggregation in sentiment analysis , 2018, 2018 4th International Conference on Web Research (ICWR).

[42]  Mohammad Ehsan Basiri,et al.  An improved evidence-based aggregation method for sentiment analysis , 2019, J. Inf. Sci..

[43]  Erik Cambria,et al.  Multimodal Sentiment Analysis: Addressing Key Issues and Setting Up the Baselines , 2018, IEEE Intelligent Systems.

[44]  Shiliang Zhang,et al.  Affective Visualization and Retrieval for Music Video , 2010, IEEE Transactions on Multimedia.

[45]  Hiroyuki Shindo,et al.  Autoencoder for Semisupervised Multiple Emotion Detection of Conversation Transcripts , 2018 .

[46]  Loong Fah Cheong,et al.  Affective understanding in film , 2006, IEEE Trans. Circuits Syst. Video Technol..

[47]  Syed Muhammad Anwar,et al.  Human emotion recognition and analysis in response to audio music using brain signals , 2016, Comput. Hum. Behav..

[48]  Li-Minn Ang,et al.  A Combined Rule-Based & Machine Learning Audio-Visual Emotion Recognition Approach , 2018, IEEE Transactions on Affective Computing.

[49]  Athanasia Zlatintsi,et al.  A supervised approach to movie emotion tracking , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[50]  Zheng Li-xin,et al.  Block Matching Algorithms for Motion Estimation , 2005 .

[51]  Lei Gao,et al.  The Labeled Multiple Canonical Correlation Analysis for Information Fusion , 2019, IEEE Transactions on Multimedia.

[52]  Xiangjian He,et al.  Hierarchical affective content analysis in arousal and valence dimensions , 2013, Signal Process..

[53]  Björn Schuller,et al.  Affective Video Retrieval: Violence Detection in Hollywood Movies by Large-Scale Segmental Feature Extraction , 2013, PloS one.

[54]  David W. Jacobs,et al.  Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Ioannis Patras,et al.  Fusion of facial expressions and EEG for implicit affective tagging , 2013, Image Vis. Comput..

[56]  Sergio Escalera,et al.  Audio-Visual Emotion Recognition in Video Clips , 2019, IEEE Transactions on Affective Computing.

[57]  Chengjun Liu,et al.  A Sparse Representation Model Using the Complete Marginal Fisher Analysis Framework and Its Applications to Visual Recognition , 2017, IEEE Transactions on Multimedia.

[58]  Frank Hopfgartner,et al.  Understanding Affective Content of Music Videos through Learned Representations , 2014, MMM.

[59]  Ahmad Reza Naghsh-Nilchi,et al.  Exploiting evidential theory in the fusion of textual, audio, and visual modalities for affective music video retrieval , 2017, 2017 3rd International Conference on Pattern Recognition and Image Analysis (IPRIA).

[60]  Arthur P. Dempster,et al.  A Generalization of Bayesian Inference , 1968, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[61]  Soraia M. Alarcão,et al.  Emotions Recognition Using EEG Signals: A Survey , 2019, IEEE Transactions on Affective Computing.

[62]  Stefan Winkler,et al.  ASCERTAIN: Emotion and Personality Recognition Using Commercial Sensors , 2018, IEEE Transactions on Affective Computing.

[63]  Erik Cambria,et al.  Towards an intelligent framework for multimodal affective data analysis , 2015, Neural Networks.

[64]  Thierry Pun,et al.  DEAP: A Database for Emotion Analysis ;Using Physiological Signals , 2012, IEEE Transactions on Affective Computing.

[65]  Thierry Pun,et al.  Multimodal Emotion Recognition in Response to Videos , 2012, IEEE Transactions on Affective Computing.

[66]  Sergio Escalera,et al.  Automatic Recognition of Facial Displays of Unfelt Emotions , 2017, IEEE Transactions on Affective Computing.

[67]  S. Tokuno,et al.  Usage of emotion recognition in military health care , 2011, 2011 Defense Science Research Conference and Expo (DSR).

[68]  Ioannis Pitas,et al.  The eNTERFACE’05 Audio-Visual Emotion Database , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[69]  Mohammad Ehsan Basiri,et al.  Translation is not enough: Comparing Lexicon-based methods for sentiment analysis in Persian , 2017, 2017 International Symposium on Computer Science and Software Engineering Conference (CSSE).

[70]  L. C. De Silva,et al.  Bimodal emotion recognition , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[71]  Erik Cambria,et al.  Fusing audio, visual and textual clues for sentiment analysis from multimodal content , 2016, Neurocomputing.

[72]  Saturnino Luz,et al.  An approach for exploring a video via multimodal feature extraction and user interactions , 2018, Journal on Multimodal User Interfaces.

[73]  Petros Maragos,et al.  Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention , 2013, IEEE Transactions on Multimedia.

[74]  Shangfei Wang,et al.  Emotion recognition through integrating EEG and peripheral signals , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[75]  Guillaume Chanel,et al.  Emotion Assessment From Physiological Signals for Adaptation of Game Difficulty , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[76]  Janez Demšar,et al.  Emotion Recognition on Twitter: Comparative Study and Training a Unison Model , 2020, IEEE Transactions on Affective Computing.

[77]  Partha Pratim Roy,et al.  Fusion of EEG response and sentiment analysis of products review to predict customer satisfaction , 2019, Inf. Fusion.

[78]  Erik Cambria,et al.  Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling , 2018, Knowl. Based Syst..

[79]  Junqing Yu,et al.  Video Affective Content Representation and Recognition Using Video Affective Tree and Hidden Markov Models , 2007, ACII.