Exploiting objective text description of images for visual sentiment analysis

This paper addresses the problem of Visual Sentiment Analysis focusing on the estimation of the polarity of the sentiment evoked by an image. Starting from an embedding approach which exploits both visual and textual features, we attempt to boost the contribution of each input view. We propose to extract and employ an Objective Text description of images rather than the classic Subjective Text provided by the users (i.e., title, tags and image description) which is extensively exploited in the state of the art to infer the sentiment associated to social images. Objective Text is obtained from the visual content of the images through recent deep learning architectures which are used to classify object, scene and to perform image captioning. Objective Text features are then combined with visual features in an embedding space obtained with Canonical Correlation Analysis. The sentiment polarity is then inferred by a supervised Support Vector Machine. During the evaluation, we compared an extensive number of text and visual features combinations and baselines obtained by considering the state of the art methods. Experiments performed on a representative dataset of 47235 labelled samples demonstrate that the exploitation of Objective Text helps to outperform state-of-the-art for sentiment polarity estimation.

[1]  Changsheng Xu,et al.  Cross-Domain Feature Learning in Multimedia , 2015, IEEE Transactions on Multimedia.

[2]  Tao Chen,et al.  DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks , 2014, ArXiv.

[3]  Jiebo Luo,et al.  Sentribute: image sentiment analysis from a mid-level perspective , 2013, WISDOM '13.

[4]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[5]  Shaogang Gong,et al.  Transductive Multi-view Embedding for Zero-Shot Recognition and Annotation , 2014, ECCV.

[6]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[7]  P. Valdez,et al.  Effects of color on emotions. , 1994, Journal of experimental psychology. General.

[8]  Shaowei Liu,et al.  General Knowledge Embedded Image Representation Learning , 2018, IEEE Transactions on Multimedia.

[9]  J. Itten The art of color : the subjective experience and objective rationale of color , 1973 .

[10]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[11]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Mike Thelwall,et al.  Sentiment in short strength detection informal text , 2010 .

[13]  Jiebo Luo,et al.  A Multifaceted Approach to Social Multimedia-Based Prediction of Elections , 2015, IEEE Transactions on Multimedia.

[14]  Giovanni Maria Farinella,et al.  Visual Sentiment Analysis Based on on Objective Text Description of Images , 2018, 2018 International Conference on Content-Based Multimedia Indexing (CBMI).

[15]  Xavier Giró-i-Nieto,et al.  From pixels to sentiment: Fine-tuning CNNs for visual sentiment prediction , 2016, Image Vis. Comput..

[16]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[17]  Michael Isard,et al.  A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.

[18]  Chong-Wah Ngo,et al.  Deep Multimodal Learning for Affective Analysis and Retrieval , 2015, IEEE Transactions on Multimedia.

[19]  Feiran Huang,et al.  Image-text sentiment analysis via deep multimodal attentive fusion , 2019, Knowl. Based Syst..

[20]  Shin'ichi Satoh,et al.  Image sentiment analysis using latent correlations among visual, textual, and sentiment views , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Sebastiano Battiato,et al.  Aesthetic scoring of digital portraits for consumer applications , 2013, Electronic Imaging.

[22]  Lamberto Ballan,et al.  Love Thy Neighbors: Image Annotation by Exploiting Image Metadata , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Xueming Qian,et al.  Rating Prediction Based on Social Sentiment From Textual Reviews , 2016, IEEE Transactions on Multimedia.

[24]  Jonathon S. Hare,et al.  Analyzing and predicting sentiment of images on the social web , 2010, ACM Multimedia.

[25]  Giovanni Maria Farinella,et al.  The Social Picture , 2016, ICMR.

[26]  Alberto Del Bimbo,et al.  Socializing the Semantic Gap , 2015, ACM Comput. Surv..

[27]  Farid Melgani,et al.  Ensemble of Deep Models for Event Recognition , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[28]  Derek Hoiem,et al.  Building text features for object image classification , 2009, CVPR.

[29]  Chihli Hung,et al.  Using Objective Words in SentiWordNet to Improve Word-of-Mouth Sentiment Classification , 2013, IEEE Intelligent Systems.

[30]  Changsheng Xu,et al.  Multi-Modal Event Topic Model for Social Event Analysis , 2016, IEEE Transactions on Multimedia.

[31]  Shuai Xu,et al.  Joint Visual-Textual Sentiment Analysis Based on Cross-Modality Attention Mechanism , 2018, MMM.

[32]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[33]  Baoxin Li,et al.  Unsupervised Sentiment Analysis for Social Media Images , 2015, IJCAI.

[34]  Kristen Grauman,et al.  Learning the Relative Importance of Objects from Tagged Images for Retrieval and Cross-Modal Search , 2011, International Journal of Computer Vision.

[35]  Svetlana Lazebnik,et al.  Improving Image-Sentence Embeddings Using Large Weakly Annotated Photo Collections , 2014, ECCV.

[36]  Amaia Salvador,et al.  Diving Deep into Sentiment: Understanding Fine-tuned CNNs for Visual Sentiment Prediction , 2015, ASM@ACM Multimedia.

[37]  Li-Jia Li,et al.  Visual Sentiment Prediction with Deep Convolutional Neural Networks , 2014, ArXiv.

[38]  Shih-Fu Chang,et al.  Designing Category-Level Attributes for Discriminative Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Yan Liu,et al.  A Unified Framework of Latent Feature Learning in Social Media , 2014, IEEE Transactions on Multimedia.

[40]  Stevan Rudinac,et al.  Learning Crowdsourced User Preferences for Visual Summarization of Image Collections , 2013, IEEE Transactions on Multimedia.

[41]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[42]  Alberto Del Bimbo,et al.  A multimodal feature learning approach for sentiment analysis of social network multimedia , 2016, Multimedia Tools and Applications.

[43]  Kristen Grauman,et al.  Accounting for the Relative Importance of Objects in Image Retrieval , 2010, BMVC.

[44]  Jiebo Luo,et al.  Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks , 2015, AAAI.

[45]  Florent Perronnin,et al.  Large-scale image categorization with explicit data embedding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[46]  Cordelia Schmid,et al.  Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47]  Tao Chen,et al.  Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology , 2015, ACM Multimedia.

[48]  James Ze Wang,et al.  Studying Aesthetics in Photographic Images Using a Computational Approach , 2006, ECCV.

[49]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[50]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.