Multi-level region-based Convolutional Neural Network for image emotion classification

Abstract Analyzing emotional information of visual content has attracted growing attention for the tendency of internet users to share their feelings via images and videos online. In this paper, we investigate the problem of affective image analysis, which is very challenging due to its complexity and subjectivity. Previous research reveals that image emotion is related to low-level to high-level visual features from both global and local view, while most of the current approaches only focus on improving emotion recognition performance based on single-level visual features from a global view. Aiming to utilize different levels of visual features from both global and local view, we propose a multi-level region-based Convolutional Neural Network (CNN) framework to discover the sentimental response of local regions. We first employ Feature Pyramid Network (FPN) to extract multi-level deep representations. Then, an emotional region proposal method is used to generate proper local regions and remove excessive non-emotional regions for image emotion classification. Finally, to deal with the subjectivity in emotional labels, we propose a multi-task loss function to take the probabilities of images belonging to different emotion classes into consideration. Extensive experiments show that our method outperforms the state-of-the-art approaches on various commonly used benchmark datasets.

[1]  Ling-Yu Duan,et al.  Hierarchical movie affective content analysis based on arousal and valence features , 2008, ACM Multimedia.

[2]  Junwei Han,et al.  A Unified Metric Learning-Based Framework for Co-Saliency Detection , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Paul L. Rosin,et al.  Visual Sentiment Prediction Based on Automatic Discovery of Affective Regions , 2018, IEEE Transactions on Multimedia.

[7]  Tao Chen,et al.  Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology , 2015, ACM Multimedia.

[8]  R. Plutchik The Nature of Emotions , 2001 .

[9]  Yue Gao,et al.  Learning Visual Emotion Distributions via Multi-Modal Features Fusion , 2017, ACM Multimedia.

[10]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[11]  Reiner Lenz,et al.  Color Based Bags-of-Emotions , 2009, CAIP.

[12]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Sam J. Maglio,et al.  Emotional category data on images from the international affective picture system , 2005, Behavior research methods.

[15]  Deyu Meng,et al.  Leveraging Prior-Knowledge for Weakly Supervised Object Detection Under a Collaborative Self-Paced Curriculum Learning Framework , 2018, International Journal of Computer Vision.

[16]  Amy Ogan,et al.  Temporally Selective Attention Model for Social and Affective State Recognition in Multimedia Content , 2017, ACM Multimedia.

[17]  Jiebo Luo,et al.  Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks , 2015, AAAI.

[18]  Jiebo Luo,et al.  Visual Sentiment Analysis by Attending on Local Image Regions , 2017, AAAI.

[19]  Tsuhan Chen,et al.  A mixed bag of emotions: Model, predict, and transfer emotion distributions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Hang-Bong Kang,et al.  Affective content detection using HMMs , 2003, ACM Multimedia.

[21]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Bing Li,et al.  Context-aware affective images classification based on bilayer sparse representation , 2012, ACM Multimedia.

[24]  Yue Gao,et al.  Continuous Probability Distribution Prediction of Image Emotions via Multitask Shared Sparse Regression , 2017, IEEE Transactions on Multimedia.

[25]  Jiebo Luo,et al.  Aesthetics and Emotions in Images , 2011, IEEE Signal Processing Magazine.

[26]  Youbao Tang,et al.  Discrete Probability Distribution Prediction of Image Emotions with Shared Sparse Learning , 2020, IEEE Transactions on Affective Computing.

[27]  Jonathon S. Hare,et al.  Analyzing and predicting sentiment of images on the social web , 2010, ACM Multimedia.

[28]  Riccardo Leonardi,et al.  A Connotative Space for Supporting Movie Affective Recommendation , 2011, IEEE Transactions on Multimedia.

[29]  Qingming Huang,et al.  Dependency Exploitation: A Unified CNN-RNN Approach for Visual Emotion Recognition , 2017, IJCAI.

[30]  Jufeng Yang,et al.  Discovering affective regions in deep convolutional neural networks for visual sentiment prediction , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[31]  Jiebo Luo,et al.  Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark , 2016, AAAI.

[32]  Tao Mei,et al.  Boosting image sentiment analysis with visual attention , 2018, Neurocomputing.

[33]  Min Xu,et al.  Multi-scale blocks based image emotion classification using multiple instance learning , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[34]  Yue Gao,et al.  Exploring Principles-of-Art Features For Image Emotion Recognition , 2014, ACM Multimedia.

[35]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Tsuhan Chen,et al.  Where do emotions come from? Predicting the Emotion Stimuli Map , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[37]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[38]  Chen Chen,et al.  Emotion in Context: Deep Semantic Feature Fusion for Video Emotion Recognition , 2016, ACM Multimedia.

[39]  Yue Gao,et al.  Predicting Personalized Emotion Perceptions of Social Images , 2016, ACM Multimedia.

[40]  Dong Xu,et al.  Learning Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection , 2019, IEEE Transactions on Image Processing.

[41]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[42]  Peter J. Lang,et al.  A Bio‐Informational Theory of Emotional Imagery , 1979 .

[43]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jianxin Wu,et al.  Deep Label Distribution Learning With Label Ambiguity , 2016, IEEE Transactions on Image Processing.

[45]  Tao Chen,et al.  Object-Based Visual Sentiment Concept Analysis and Application , 2014, ACM Multimedia.

[46]  Hao Chen,et al.  CNNs-Based RGB-D Saliency Detection via Cross-View Transfer and Multiview Fusion , 2017 .

[47]  Nicu Sebe,et al.  Emotional valence categorization using holistic image features , 2008, 2008 15th IEEE International Conference on Image Processing.

[48]  Tao Mei,et al.  Beyond Object Recognition: Visual Sentiment Analysis with Deep Coupled Adjective and Noun Neural Networks , 2016, IJCAI.