Learning Discriminative Sentiment Representation from Strongly- and Weakly Supervised CNNs

Visual sentiment analysis is attracting increasing attention with the rapidly growing amount of images uploaded to social networks. Learning rich visual representations often requires training deep convolutional neural networks (CNNs) on massive manually labeled data, which is expensive or scarce especially for a subjective task like visual sentiment analysis. Meanwhile, a large quantity of social images is quite available yet noisy by querying social networks using the sentiment categories as keywords, where various types of images related to the specific sentiment can be easily collected. In this article, we propose a multiple kernel network for visual sentiment recognition, which learns representation from strongly- and weakly supervised CNNs. Specifically, the weakly supervised deep model is trained using the large-scale data from social images, whereas the strongly supervised deep model is fine tuned on the affecitve datasets with manual annotation. We employ the multiple kernel scheme on the multiple layers of CNNs, which can automatically select the discriminative representation by learning a linear combination from a set of pre-defined kernels. In addition, we introduce a large-scale dataset collected from popular comics of various countries, such as America, Japan, China, and France, which consists of 11,821 images with various artistic styles. Experimental results show that the multiple kernel network achieves consistent improvements over the state-of-the-art methods on the public affective datasets, as well as the newly established Comics dataset. The Comics dataset can be found at http://cv.nankai.edu.cn/projects/Comic.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Alan Hanjalic,et al.  Extracting Moods from Pictures and Sounds , 2006 .

[3]  Tao Chen,et al.  Assistive Image Comment Robot—A Novel Mid-Level Concept-Based Representation , 2015, IEEE Transactions on Affective Computing.

[4]  Shuai Wang,et al.  Deep learning for sentiment analysis: A survey , 2018, WIREs Data Mining Knowl. Discov..

[5]  Tsuhan Chen,et al.  Cross-layer features in convolutional neural networks for generic classification tasks , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[6]  Qingming Huang,et al.  Affective Image Content Analysis: A Comprehensive Survey , 2018, IJCAI.

[7]  Sam J. Maglio,et al.  Emotional category data on images from the international affective picture system , 2005, Behavior research methods.

[8]  Yu Ying-lin,et al.  Image Retrieval by Emotional Semantics: A Study of Emotional Space and Feature Extraction , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[9]  Mohan S. Kankanhalli,et al.  Emotional Attention: A Study of Image Sentiment and Visual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Tao Chen,et al.  Object-Based Visual Sentiment Concept Analysis and Application , 2014, ACM Multimedia.

[11]  Min Xu,et al.  Learning Multi-level Deep Representations for Image Emotion Classification , 2016, Neural Processing Letters.

[12]  Yue Gao,et al.  Learning Visual Emotion Distributions via Multi-Modal Features Fusion , 2017, ACM Multimedia.

[13]  P. Ekman,et al.  DIFFERENCES Universals and Cultural Differences in the Judgments of Facial Expressions of Emotion , 2004 .

[14]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[15]  Tsuhan Chen,et al.  A framework of extracting multi-scale features using multiple convolutional neural networks , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[16]  Jiebo Luo,et al.  When saliency meets sentiment: Understanding how image content invokes emotion and sentiment , 2016, 2017 IEEE International Conference on Image Processing (ICIP).

[17]  Manuel J. Fonseca,et al.  Identifying emotions in images from valence and arousal ratings , 2017, Multimedia Tools and Applications.

[18]  Hichem Sahbi,et al.  Semi supervised deep kernel design for image annotation , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Tao Chen,et al.  DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks , 2014, ArXiv.

[21]  Nicu Sebe,et al.  Who's Afraid of Itten: Using the Art Theory of Color Combination to Analyze Emotions in Abstract Paintings , 2015, ACM Multimedia.

[22]  Yue Gao,et al.  Exploring Principles-of-Art Features For Image Emotion Recognition , 2014, ACM Multimedia.

[23]  Jiebo Luo,et al.  Sentribute: image sentiment analysis from a mid-level perspective , 2013, WISDOM '13.

[24]  A. Hanjalic,et al.  Extracting moods from pictures and sounds: towards truly personalized TV , 2006, IEEE Signal Processing Magazine.

[25]  Jiebo Luo,et al.  Aesthetics and Emotions in Images , 2011, IEEE Signal Processing Magazine.

[26]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[27]  Quoc-Tuan Truong,et al.  Visual Sentiment Analysis for Review Images with Item-Oriented and User-Oriented CNN , 2017, ACM Multimedia.

[28]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[29]  Jie Tang,et al.  Can we understand van gogh's mood?: learning to infer affects from images in social networks , 2012, ACM Multimedia.

[30]  Yue Gao,et al.  Real-Time Multimedia Social Event Detection in Microblog , 2018, IEEE Transactions on Cybernetics.

[31]  Jiebo Luo,et al.  Reducing noisy labels in weakly labeled data for visual sentiment analysis , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[32]  Erik Cambria,et al.  Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis , 2017, Neurocomputing.

[33]  Jufeng Yang,et al.  Joint Image Emotion Classification and Distribution Learning via Deep Convolutional Neural Network , 2017, IJCAI.

[34]  Ivor W. Tsang,et al.  Two-Layer Multiple Kernel Learning , 2011, AISTATS.

[35]  Wei Zhang,et al.  Emotion recognition by assisted learning with convolutional neural networks , 2018, Neurocomputing.

[36]  Yue Gao,et al.  Continuous Probability Distribution Prediction of Image Emotions via Multitask Shared Sparse Regression , 2017, IEEE Transactions on Multimedia.

[37]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  P. Ekman An argument for basic emotions , 1992 .

[39]  Jiebo Luo,et al.  Visual Sentiment Analysis by Attending on Local Image Regions , 2017, AAAI.

[40]  Christoph H. Lampert,et al.  Deep Fisher Kernels -- End to End Learning of the Fisher Kernel GMM Parameters , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Jan P. Allebach,et al.  Learning deep features for image emotion classification , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[42]  Yue Gao,et al.  Approximating Discrete Probability Distribution of Image Emotions by Multi-Modal Features Fusion , 2017, IJCAI.

[43]  Qingming Huang,et al.  Dependency Exploitation: A Unified CNN-RNN Approach for Visual Emotion Recognition , 2017, IJCAI.

[44]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[45]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[46]  Yang Gao,et al.  Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[48]  Tao Mei,et al.  Boosting image sentiment analysis with visual attention , 2018, Neurocomputing.

[49]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Kurt Keutzer,et al.  EmotionGAN: Unsupervised Domain Adaptation for Learning Discrete Probability Distributions of Image Emotions , 2018, ACM Multimedia.

[51]  Paul L. Rosin,et al.  Visual Sentiment Prediction Based on Automatic Discovery of Affective Regions , 2018, IEEE Transactions on Multimedia.

[52]  James Ze Wang,et al.  On shape and the computability of emotions , 2012, ACM Multimedia.

[53]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[54]  Qi Tian,et al.  Picking Deep Filter Responses for Fine-Grained Image Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Hongxun Yao,et al.  Affective Image Retrieval via Multi-Graph Learning , 2014, ACM Multimedia.

[56]  Jufeng Yang,et al.  Learning Visual Sentiment Distributions via Augmented Conditional Probability Neural Network , 2017, AAAI.

[57]  Jiebo Luo,et al.  Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark , 2016, AAAI.

[58]  Xavier Giró-i-Nieto,et al.  From pixels to sentiment: Fine-tuning CNNs for visual sentiment prediction , 2016, Image Vis. Comput..

[59]  Cordelia Schmid,et al.  Convolutional Kernel Networks , 2014, NIPS.

[60]  Amaia Salvador,et al.  Diving Deep into Sentiment: Understanding Fine-tuned CNNs for Visual Sentiment Prediction , 2015, ASM@ACM Multimedia.

[61]  Tsuhan Chen,et al.  A mixed bag of emotions: Model, predict, and transfer emotion distributions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Amit K. Roy-Chowdhury,et al.  Contemplating Visual Emotions: Understanding and Overcoming Dataset Bias , 2018, ECCV.

[63]  Jufeng Yang,et al.  Discovering affective regions in deep convolutional neural networks for visual sentiment prediction , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[64]  A. Lewis Making Comics: Storytelling Secrets of Comics, Manga and Graphic Novels , 2007 .

[65]  Ming-Hsuan Yang,et al.  Weakly Supervised Coupled Networks for Visual Sentiment Analysis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[66]  Munmun De Choudhury,et al.  Towards using visual attributes to infer image sentiment of social events , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[67]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[68]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[69]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[70]  Jiebo Luo,et al.  Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks , 2015, AAAI.

[71]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[72]  Ming-Hsuan Yang,et al.  Retrieving and Classifying Affective Images via Deep Metric Learning , 2018, AAAI.