Visual Sentiment Analysis Based on on Objective Text Description of Images

Visual Sentiment Analysis aims to estimate the polarity of the sentiment evoked by images in terms of positive or negative sentiment. To this aim, most of the state of the art works exploit the text associated to a social post provided by the user. However, such textual data is typically noisy due to the subjectivity of the user which usually includes text useful to maximize the diffusion of the social post. In this paper we extract and employ an Objective Text description of images automatically extracted from the visual content rather than the classic Subjective Text provided by the users. The proposed method defines a multimodal embedding space based on the contribute of both visual and textual features. The sentiment polarity is then inferred by a supervised Support Vector Machine trained on the representations of the obtained embedding space. Experiments performed on a representative dataset of 47235 labelled samples demonstrate that the exploitation of the proposed Objective Text helps to outperform state-of-the-art for sentiment polarity estimation.

[1]  Shin'ichi Satoh,et al.  Image sentiment analysis using latent correlations among visual, textual, and sentiment views , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Alberto Del Bimbo,et al.  Socializing the Semantic Gap , 2015, ACM Comput. Surv..

[3]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[4]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[5]  Changsheng Xu,et al.  Cross-Domain Feature Learning in Multimedia , 2015, IEEE Transactions on Multimedia.

[6]  Jiebo Luo,et al.  Sentribute: image sentiment analysis from a mid-level perspective , 2013, WISDOM '13.

[7]  Tao Chen,et al.  Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology , 2015, ACM Multimedia.

[8]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[9]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[10]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[11]  Baoxin Li,et al.  Unsupervised Sentiment Analysis for Social Media Images , 2015, IJCAI.

[12]  Cordelia Schmid,et al.  Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[14]  Jonathon S. Hare,et al.  Analyzing and predicting sentiment of images on the social web , 2010, ACM Multimedia.

[15]  Mike Thelwall,et al.  Sentiment in short strength detection informal text , 2010 .

[16]  Shaowei Liu,et al.  General Knowledge Embedded Image Representation Learning , 2018, IEEE Transactions on Multimedia.

[17]  Kristen Grauman,et al.  Learning the Relative Importance of Objects from Tagged Images for Retrieval and Cross-Modal Search , 2011, International Journal of Computer Vision.

[18]  Svetlana Lazebnik,et al.  Improving Image-Sentence Embeddings Using Large Weakly Annotated Photo Collections , 2014, ECCV.

[19]  Alberto Del Bimbo,et al.  A multimodal feature learning approach for sentiment analysis of social network multimedia , 2016, Multimedia Tools and Applications.

[20]  Kristen Grauman,et al.  Accounting for the Relative Importance of Objects in Image Retrieval , 2010, BMVC.

[21]  Michael Isard,et al.  A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.

[22]  Derek Hoiem,et al.  Building text features for object image classification , 2009, CVPR.

[23]  Chihli Hung,et al.  Using Objective Words in SentiWordNet to Improve Word-of-Mouth Sentiment Classification , 2013, IEEE Intelligent Systems.

[24]  Yan Liu,et al.  A Unified Framework of Latent Feature Learning in Social Media , 2014, IEEE Transactions on Multimedia.

[25]  Stevan Rudinac,et al.  Learning Crowdsourced User Preferences for Visual Summarization of Image Collections , 2013, IEEE Transactions on Multimedia.

[26]  Lamberto Ballan,et al.  Love Thy Neighbors: Image Annotation by Exploiting Image Metadata , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Giovanni Maria Farinella,et al.  The Social Picture , 2016, ICMR.

[29]  Tao Chen,et al.  DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks , 2014, ArXiv.

[30]  Shaogang Gong,et al.  Transductive Multi-view Embedding for Zero-Shot Recognition and Annotation , 2014, ECCV.

[31]  Shih-Fu Chang,et al.  Designing Category-Level Attributes for Discriminative Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[33]  Chong-Wah Ngo,et al.  Deep Multimodal Learning for Affective Analysis and Retrieval , 2015, IEEE Transactions on Multimedia.

[34]  Changsheng Xu,et al.  Tag-aware image classification via Nested Deep Belief nets , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[35]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[36]  Changsheng Xu,et al.  Multi-Modal Event Topic Model for Social Event Analysis , 2016, IEEE Transactions on Multimedia.