Learning Visual Emotion Distributions via Multi-Modal Features Fusion

Current image emotion recognition works mainly classified the images into one dominant emotion category, or regressed the images with average dimension values by assuming that the emotions perceived among different viewers highly accord with each other. However, due to the influence of various personal and situational factors, such as culture background and social interactions, different viewers may react totally different from the emotional perspective to the same image. In this paper, we propose to formulate the image emotion recognition task as a probability distribution learning problem. Motivated by the fact that image emotions can be conveyed through different visual features, such as aesthetics and semantics, we present a novel framework by fusing multi-modal features to tackle this problem. In detail, weighted multi-modal conditional probability neural network (WMMCPNN) is designed as the learning model to associate the visual features with emotion probabilities. By jointly exploring the complementarity and learning the optimal combination coefficients of different modality features, WMMCPNN could effectively utilize the representation ability of each uni-modal feature. We conduct extensive experiments on three publicly available benchmarks and the results demonstrate that the proposed method significantly outperforms the state-of-the-art approaches for emotion distribution prediction.

[1]  James Ze Wang,et al.  On shape and the computability of emotions , 2012, ACM Multimedia.

[2]  Hongxun Yao,et al.  Predicting discrete probability distribution of image emotions , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[3]  Jim Dowling,et al.  Predicting probability distributions for surf height using an ensemble of mixture density networks , 2005, ICML.

[4]  Jufeng Yang,et al.  Discovering affective regions in deep convolutional neural networks for visual sentiment prediction , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[5]  Juan-Zi Li,et al.  How Do Your Friends on Social Media Disclose Your Emotions? , 2014, AAAI.

[6]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[7]  Yue Gao,et al.  Predicting Personalized Emotion Perceptions of Social Images , 2016, ACM Multimedia.

[8]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Yue Gao,et al.  Predicting Personalized Image Emotion Perceptions in Social Networks , 2018, IEEE Transactions on Affective Computing.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Hongxun Yao,et al.  Affective Image Retrieval via Multi-Graph Learning , 2014, ACM Multimedia.

[12]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[13]  Min-Ling Zhang,et al.  Lift: Multi-Label Learning with Label-Specific Features , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Jiebo Luo,et al.  Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark , 2016, AAAI.

[15]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[16]  Alessandro Vinciarelli,et al.  A Survey of Personality Computing , 2014, IEEE Transactions on Affective Computing.

[17]  Sam J. Maglio,et al.  Emotional category data on images from the international affective picture system , 2005, Behavior research methods.

[18]  Tao Chen,et al.  Object-Based Visual Sentiment Concept Analysis and Application , 2014, ACM Multimedia.

[19]  Yeshaiahu Fainman,et al.  A learning law for density estimation , 1994, IEEE Trans. Neural Networks.

[20]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[21]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[22]  Tsuhan Chen,et al.  A mixed bag of emotions: Model, predict, and transfer emotion distributions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Nicu Sebe,et al.  Recognizing Emotions from Abstract Paintings Using Non-Linear Matrix Completion , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Sonja Grün,et al.  Impact of Spike Train Autostructure on Probability Distribution of Joint Spike Events , 2013, Neural Computation.

[25]  P. Ekman An argument for basic emotions , 1992 .

[26]  Zhi-Hua Zhou,et al.  Facial Age Estimation by Learning from Label Distributions , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Yue Gao,et al.  Exploring Principles-of-Art Features For Image Emotion Recognition , 2014, ACM Multimedia.

[28]  Jiebo Luo,et al.  Robust Visual-Textual Sentiment Analysis: When Attention meets Tree-structured Recursive Neural Networks , 2016, ACM Multimedia.

[29]  Johannes Wagner,et al.  Exploring Fusion Methods for Multimodal Emotion Recognition with Missing Data , 2011, IEEE Transactions on Affective Computing.

[30]  Yue Gao,et al.  Continuous Probability Distribution Prediction of Image Emotions via Multitask Shared Sparse Regression , 2017, IEEE Transactions on Multimedia.

[31]  H. Schlosberg Three dimensions of emotion. , 1954, Psychological review.

[32]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[33]  Xin Geng,et al.  Label Distribution Learning , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[34]  Hui Chen,et al.  Temporal-Difference Learning With Sampling Baseline for Image Captioning , 2018, AAAI.

[35]  Belur V. Dasarathy,et al.  Medical Image Fusion: A survey of the state of the art , 2013, Inf. Fusion.

[36]  Tao Mei,et al.  Beyond Object Recognition: Visual Sentiment Analysis with Deep Coupled Adjective and Noun Neural Networks , 2016, IJCAI.

[37]  Yue Gao,et al.  Multimedia Social Event Detection in Microblog , 2015, MMM.

[38]  Hui Tian,et al.  Cumulative Probability Distribution Model for Evaluating User Behavior Prediction Algorithms , 2013, 2013 International Conference on Social Computing.

[39]  James Hays,et al.  SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Jiebo Luo,et al.  Aesthetics and Emotions in Images , 2011, IEEE Signal Processing Magazine.

[41]  Qiang Liu,et al.  Reference Based LSTM for Image Captioning , 2017, AAAI.

[42]  Yue Gao,et al.  Approximating Discrete Probability Distribution of Image Emotions by Multi-Modal Features Fusion , 2017, IJCAI.