Median based Multi-label Prediction by Inflating Emotions with Dyads for Visual Sentiment Analysis

Visual sentiment analysis investigates sentiment estimation from images and has been an interesting and challenging research problem. Most studies have focused on estimating a few specific sentiments and their intensities. Multi-label sentiment estimation from images has not been sufficiently investigated. The purpose of this research is to accurately estimate the sentiments as a multi-label multi-class problem from given images that evoke multiple different emotions simultaneously. We first introduce the emotion inflation method from six emotions defined by the Emotion6 dataset into 13 emotions (which we call ‘Transf13’) by means of emotional dyads. We then perform multi-label sentiment analysis using the emotion-inflated dataset, where we propose a combined deep neural network model which enables inputs to come from both hand-crafted features (e.g. BoVW (Bag of Visual Words) features) and CNN features. We also introduce a median-based multi-label prediction algorithm, in which we assume that each emotion has a probability distribution. In other words, after training of our deep neural network, we predict the existence of an evoked emotion for a given unknown image if the intensity of the emotion is larger than the median of the corresponding emotion. Experimental results demonstrate that our model outperforms existing state-of-the-art algorithms in terms of subset accuracy.

[1]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Mohan S. Kankanhalli,et al.  Emotion-Aware Human Attention Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[4]  Mohammad Soleymani,et al.  A survey of multimodal sentiment analysis , 2017, Image Vis. Comput..

[5]  Miki Haseyama,et al.  A Cross-Modal Approach for Extracting Semantic Relationships Between Concepts Using Tagged Images , 2014, IEEE Transactions on Multimedia.

[6]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[7]  Reiner Lenz,et al.  Color Based Bags-of-Emotions , 2009, CAIP.

[8]  Tsuhan Chen,et al.  A mixed bag of emotions: Model, predict, and transfer emotion distributions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[10]  C. W. Hughes Emotion: Theory, Research and Experience , 1982 .

[11]  R. Plutchik Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice , 2016 .

[12]  Jing Liu,et al.  Low-rank regularized multi-view inverse-covariance estimation for visual sentiment distribution prediction , 2018, J. Vis. Commun. Image Represent..

[13]  Giovanni Pilato,et al.  Binding representational spaces of colors and emotions for creativity , 2013, BICA 2013.

[14]  Ming-Hsuan Yang,et al.  Weakly Supervised Coupled Networks for Visual Sentiment Analysis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Li-Jia Li,et al.  Visual Sentiment Prediction with Deep Convolutional Neural Networks , 2014, ArXiv.

[16]  In-Kwon Lee,et al.  Building Emotional Machines: Recognizing Image Emotions Through Deep Neural Networks , 2017, IEEE Transactions on Multimedia.

[17]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[18]  Mohan S. Kankanhalli,et al.  Emotional Attention: A Study of Image Sentiment and Visual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  Shu Liu,et al.  Predicting Image Emotion Distribution by Emotional Region , 2018, 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI).

[21]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[23]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[24]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[25]  P. Ekman What emotion categories or dimensions can observers judge from facial behavior , 1982 .

[26]  P M Panchal,et al.  A Comparison of SIFT and SURF , 2013 .

[27]  Changshui Zhang,et al.  Aligning where to see and what to tell: image caption with region-based attention and scene factorization , 2015, ArXiv.

[28]  Johannes Fürnkranz,et al.  Maximizing Subset Accuracy with Recurrent Neural Networks in Multi-label Classification , 2017, NIPS.

[29]  M. Saleem Khan,et al.  A Fuzzy Inference System for Synergy Estimation of Simultaneous Emotion Dynamics in Agents , 2011 .

[30]  Tsuhan Chen,et al.  Where do emotions come from? Predicting the Emotion Stimuli Map , 2016, 2016 IEEE International Conference on Image Processing (ICIP).