WSCNet: Weakly Supervised Coupled Networks for Visual Sentiment Classification and Detection

Automatic assessment of sentiment from visual content has gained considerable attention with the increasing tendency of expressing opinions online. In this paper, we solve the problem of visual sentiment analysis, which is challenging due to the high-level abstraction in the recognition process. Existing methods based on convolutional neural networks learn sentiment representations from the holistic image, despite the fact that different image regions can have different influence on the evoked sentiment. In this paper, we introduce a weakly supervised coupled convolutional network (WSCNet). Our method is dedicated to automatically selecting relevant soft proposals given weak annotations (e.g., global image labels), thereby significantly reducing the annotation burden, and encompasses the following contributions. First, the proposed WSCNet detects a sentiment-specific soft map by training a fully convolutional network with the cross spatial pooling strategy in the detection branch. Second, both the holistic and localized information are utilized by coupling the sentiment map with deep features as semantic vector in the classification branch. The sentiment detection and classification branches are integrated into a unified deep framework optimized in an end-to-end manner. Extensive experiments demonstrate that the proposed WSCNet outperforms the state-of-the-art results on seven benchmark datasets.

[1]  Qingming Huang,et al.  Affective Image Content Analysis: A Comprehensive Survey , 2018, IJCAI.

[2]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3]  Bing Li,et al.  Context-aware affective images classification based on bilayer sparse representation , 2012, ACM Multimedia.

[4]  Yue Gao,et al.  Continuous Probability Distribution Prediction of Image Emotions via Multitask Shared Sparse Regression , 2017, IEEE Transactions on Multimedia.

[5]  Ming-Hsuan Yang,et al.  Weakly Supervised Object Localization with Progressive Domain Adaptation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Quoc-Tuan Truong,et al.  Visual Sentiment Analysis for Review Images with Item-Oriented and User-Oriented CNN , 2017, ACM Multimedia.

[7]  Yi Zhu,et al.  Soft Proposal Networks for Weakly Supervised Object Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Min Xu,et al.  Modeling temporal information using discrete fourier transform for recognizing emotions in user-generated videos , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Nicu Sebe,et al.  Viraliency: Pooling Local Virality , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Paul L. Rosin,et al.  Visual Sentiment Prediction Based on Automatic Discovery of Affective Regions , 2018, IEEE Transactions on Multimedia.

[12]  James M. Joyce Kullback-Leibler Divergence , 2011, International Encyclopedia of Statistical Science.

[13]  G. Pourtois,et al.  The perception and categorisation of emotional stimuli: A review , 2010 .

[14]  P. Vuilleumier,et al.  How brains beware: neural mechanisms of emotional attention , 2005, Trends in Cognitive Sciences.

[15]  Lorenzo Porzi,et al.  Predicting and Understanding Urban Perception with Convolutional Neural Networks , 2015, ACM Multimedia.

[16]  Yi Yang,et al.  Self-produced Guidance for Weakly-supervised Object Localization , 2018, ECCV.

[17]  Jiebo Luo,et al.  Aesthetics and Emotions in Images , 2011, IEEE Signal Processing Magazine.

[18]  SheDongyu,et al.  Learning Discriminative Sentiment Representation from Strongly- and Weakly Supervised CNNs , 2020 .

[19]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[20]  Rashmi Gupta,et al.  Commentary: Neural Control of Vascular Reactions: Impact of Emotion and Attention , 2016, Front. Psychol..

[21]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22]  O. Meur,et al.  Predicting visual fixations on video based on low-level visual features , 2007, Vision Research.

[23]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[24]  Yao Zhao,et al.  Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Min Xu,et al.  Multi-level region-based Convolutional Neural Network for image emotion classification , 2019, Neurocomputing.

[26]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Deyu Meng,et al.  Co-Saliency Detection via a Self-Paced Multiple-Instance Learning Framework , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Cordelia Schmid,et al.  Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Jufeng Yang,et al.  Joint Image Emotion Classification and Distribution Learning via Deep Convolutional Neural Network , 2017, IJCAI.

[31]  Yue Gao,et al.  Predicting Personalized Emotion Perceptions of Social Images , 2016, ACM Multimedia.

[32]  Wei Liu,et al.  Deep Self-Taught Learning for Weakly Supervised Object Localization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Shin'ichi Satoh,et al.  Image sentiment analysis using latent correlations among visual, textual, and sentiment views , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Frédo Durand,et al.  What Do Different Evaluation Metrics Tell Us About Saliency Models? , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Yi Yang,et al.  Adversarial Complementary Learning for Weakly Supervised Object Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Sicheng Zhao,et al.  Zero-Shot Emotion Recognition via Affective Structural Embedding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Yue Gao,et al.  Predicting Personalized Image Emotion Perceptions in Social Networks , 2018, IEEE Transactions on Affective Computing.

[39]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[40]  Ming Sun,et al.  Learning Discriminative Sentiment Representation from Strongly- and Weakly Supervised CNNs , 2019 .

[41]  Yue Gao,et al.  Exploring Principles-of-Art Features For Image Emotion Recognition , 2014, ACM Multimedia.

[42]  Andrea Vedaldi,et al.  Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[44]  Tao Chen,et al.  Assistive Image Comment Robot—A Novel Mid-Level Concept-Based Representation , 2015, IEEE Transactions on Affective Computing.

[45]  Yu Ying-lin,et al.  Image Retrieval by Emotional Semantics: A Study of Emotional Space and Feature Extraction , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[46]  Mohan S. Kankanhalli,et al.  Emotional Attention: A Study of Image Sentiment and Visual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Tsuhan Chen,et al.  A mixed bag of emotions: Model, predict, and transfer emotion distributions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Jufeng Yang,et al.  Learning Visual Sentiment Distributions via Augmented Conditional Probability Neural Network , 2017, AAAI.

[49]  Jiebo Luo,et al.  Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark , 2016, AAAI.

[50]  Xavier Giró-i-Nieto,et al.  From pixels to sentiment: Fine-tuning CNNs for visual sentiment prediction , 2016, Image Vis. Comput..

[51]  Ivan Laptev,et al.  Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[53]  Yarin Gal,et al.  Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[54]  Jiebo Luo,et al.  Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks , 2015, AAAI.

[55]  R. Compton The interface between emotion and attention: a review of evidence from psychology and neuroscience. , 2003, Behavioral and cognitive neuroscience reviews.

[56]  Shengming Jiang,et al.  Image Retrieval by Emotional Semantics: A Study of Emotional Space and Feature Extraction , 2006, SMC.

[57]  Claire Cardie,et al.  39. Opinion mining and sentiment analysis , 2014 .

[58]  Sicheng Zhao,et al.  Attention-Aware Polarity Sensitive Embedding for Affective Image Retrieval , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[59]  Kaiqi Huang,et al.  Weakly Supervised Large Scale Object Localization with Multiple Instance Learning and Bag Splitting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Ming-Hsuan Yang,et al.  Retrieving and Classifying Affective Images via Deep Metric Learning , 2018, AAAI.

[61]  Matthieu Cord,et al.  WILDCAT: Weakly Supervised Learning of Deep ConvNets for Image Classification, Pointwise Localization and Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Qi Zhao,et al.  SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[63]  Jie Tang,et al.  Can we understand van gogh's mood?: learning to infer affects from images in social networks , 2012, ACM Multimedia.

[64]  Luc Van Gool,et al.  Weakly Supervised Cascaded Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Tao Chen,et al.  DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks , 2014, ArXiv.

[66]  Jiebo Luo,et al.  Visual Sentiment Analysis by Attending on Local Image Regions , 2017, AAAI.

[67]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[68]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[69]  Yunchao Wei,et al.  Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi-Supervised Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[70]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[71]  Xue Li,et al.  Deep Attention-Based Spatially Recursive Networks for Fine-Grained Visual Recognition , 2019, IEEE Transactions on Cybernetics.

[72]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[73]  Min Xu,et al.  Learning Multi-level Deep Representations for Image Emotion Classification , 2016, Neural Processing Letters.

[74]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[75]  Àgata Lapedriza,et al.  Emotion Recognition in Context , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Paul Dourish,et al.  How emotion is made and measured , 2007, Int. J. Hum. Comput. Stud..

[77]  Yuxin Peng,et al.  The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Junwei Han,et al.  Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[79]  Jufeng Yang,et al.  Discovering affective regions in deep convolutional neural networks for visual sentiment prediction , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[80]  Ming-Hsuan Yang,et al.  Weakly Supervised Coupled Networks for Visual Sentiment Analysis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[81]  Nicu Sebe,et al.  Emotional valence categorization using holistic image features , 2008, 2008 15th IEEE International Conference on Image Processing.

[82]  Yue Gao,et al.  Approximating Discrete Probability Distribution of Image Emotions by Multi-Modal Features Fusion , 2017, IJCAI.

[83]  Qingming Huang,et al.  Dependency Exploitation: A Unified CNN-RNN Approach for Visual Emotion Recognition , 2017, IJCAI.

[84]  Rongrong Ji,et al.  Survey of visual sentiment prediction for social media analysis , 2016, Frontiers of Computer Science.

[85]  Jiajun Wu,et al.  Deep multiple instance learning for image classification and auto-annotation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[86]  Nicu Sebe,et al.  Who's Afraid of Itten: Using the Art Theory of Color Combination to Analyze Emotions in Abstract Paintings , 2015, ACM Multimedia.

[87]  Jiebo Luo,et al.  Sentribute: image sentiment analysis from a mid-level perspective , 2013, WISDOM '13.

[88]  Cordelia Schmid,et al.  Multi-fold MIL Training for Weakly Supervised Object Localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[89]  Xiangyang Xue,et al.  Frame-Transformer Emotion Classification Network , 2017, ICMR.

[90]  Amaia Salvador,et al.  Diving Deep into Sentiment: Understanding Fine-tuned CNNs for Visual Sentiment Prediction , 2015, ASM@ACM Multimedia.

[91]  Mohan S. Kankanhalli,et al.  The Role of Visual Attention in Sentiment Prediction , 2017, ACM Multimedia.

[92]  Jiebo Luo,et al.  Reducing noisy labels in weakly labeled data for visual sentiment analysis , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[93]  Xuelong Li,et al.  Detection of Co-salient Objects by Looking Deep and Wide , 2016, International Journal of Computer Vision.

[94]  Tsuhan Chen,et al.  Where do emotions come from? Predicting the Emotion Stimuli Map , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[95]  A. Ohman,et al.  Emotion drives attention: detecting the snake in the grass. , 2001, Journal of experimental psychology. General.