Multilingual visual sentiment concept clustering and analysis

Visual content is a rich medium that can be used to communicate not only facts and events, but also emotions and opinions. In some cases, visual content may carry a universal affective bias (e.g., natural disasters or beautiful scenes). Often however, to achieve a parity in the affections a visual media invokes in its recipient compared to the one an author intended requires a deep understanding and even sharing of cultural backgrounds. In this study, we propose a computational framework for the clustering and analysis of multilingual visual affective concepts used in different languages which enable us to pinpoint alignable differences (via similar concepts) and nonalignable differences (via unique concepts) across cultures. To do so, we crowdsource sentiment labels for the MVSO dataset, which contains 16 K multilingual visual sentiment concepts and 7.3M images tagged with these concepts. We then represent these concepts in a distribution-based word vector space via (1) pivotal translation or (2) cross-lingual semantic alignment. We then evaluate these representations on three tasks: affective concept retrieval, concept clustering, and sentiment prediction—all across languages. The proposed clustering framework enables the analysis of the large multilingual dataset both quantitatively and qualitatively. We also show a novel use case consisting of a facial image data subset and explore cultural insights about visual sentiment concepts in such portrait-focused images.

[1]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[2]  Björn W. Schuller,et al.  New Avenues in Opinion Mining and Sentiment Analysis , 2013, IEEE Intelligent Systems.

[3]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[4]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[5]  Doris Y. Tsao,et al.  Neurons that keep a straight face , 2014, Proceedings of the National Academy of Sciences.

[6]  Eric Gilbert,et al.  Faces engage us: photos with faces attract more likes and comments on Instagram , 2014, CHI.

[7]  Hugo Larochelle,et al.  An Autoencoder Approach to Learning Bilingual Word Representations , 2014, NIPS.

[8]  Thorsten Joachims,et al.  Evaluation methods for unsupervised word embeddings , 2015, EMNLP.

[9]  Tao Chen,et al.  Complura: Exploring and Leveraging a Large-scale Multilingual Visual Sentiment Ontology , 2016, ICMR.

[10]  José M. F. Moura,et al.  VisualWord2Vec (Vis-W2V): Learning Visually Grounded Word Embeddings Using Abstract Scenes , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12]  Claire Cardie,et al.  Multi-aspect Sentiment Analysis with Topic Models , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[13]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[14]  Phil Blunsom,et al.  Multilingual Models for Compositional Distributed Semantics , 2014, ACL.

[15]  Jure Leskovec,et al.  Learning Attitudes and Attributes from Multi-aspect Reviews , 2012, 2012 IEEE 12th International Conference on Data Mining.

[16]  Rada Mihalcea,et al.  Towards multimodal sentiment analysis: harvesting opinions from the web , 2011, ICMI '11.

[17]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[18]  Ivan Titov,et al.  Modeling online reviews with multi-grain topic models , 2008, WWW.

[19]  Ivan Titov,et al.  Inducing Crosslingual Distributed Representations of Words , 2012, COLING.

[20]  Miriam Redi,et al.  The beauty of capturing faces: Rating the quality of digital portraits , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[21]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[22]  Ronan Collobert,et al.  Word Embeddings through Hellinger PCA , 2013, EACL.

[23]  Shih-Fu Chang,et al.  Deep Cross Residual Learning for Multitask Visual Recognition , 2016, ACM Multimedia.

[24]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[25]  Steven Skiena,et al.  Polyglot: Distributed Word Representations for Multilingual NLP , 2013, CoNLL.

[26]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[27]  Shih-Fu Chang,et al.  Predicting Viewer Perceived Emotions in Animated GIFs , 2014, ACM Multimedia.

[28]  Quoc V. Le,et al.  Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.

[29]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[30]  Tao Chen,et al.  Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology , 2015, ACM Multimedia.

[31]  Armand Joulin,et al.  Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.

[32]  James P. Bagrow,et al.  Human language reveals a universal positivity bias , 2014, Proceedings of the National Academy of Sciences.

[33]  Ruslan Salakhutdinov,et al.  Multimodal Neural Language Models , 2014, ICML.

[34]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.

[35]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[36]  Christian Biemann,et al.  Corpus Portal for Search in Monolingual Corpora , 2006, LREC.

[37]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[38]  Guillaume Lample,et al.  Massively Multilingual Word Embeddings , 2016, ArXiv.

[39]  Andrei Popescu-Belis,et al.  Explaining the Stars: Weighted Multiple-Instance Learning for Aspect-Based Sentiment Analysis , 2014, EMNLP.

[40]  Verónica Pérez-Rosas,et al.  Multimodal Sentiment Analysis of Spanish Online Videos , 2013, IEEE Intelligent Systems.

[41]  Tao Chen,et al.  Multilingual Visual Sentiment Concept Matching , 2016, ICMR.

[42]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[43]  Shengcai Liao,et al.  A Fast and Accurate Unconstrained Face Detector , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[45]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[46]  Angeliki Lazaridou,et al.  Combining Language and Vision with a Multimodal Skip-gram Model , 2015, NAACL.

[47]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[48]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[49]  Nathan Schneider,et al.  Association for Computational Linguistics: Human Language Technologies , 2011 .

[50]  Carina Silberer,et al.  Learning Grounded Meaning Representations with Autoencoders , 2014, ACL.

[51]  Heng Ji,et al.  Event Specific Multimodal Pattern Mining for Knowledge Base Construction , 2016, ACM Multimedia.

[52]  Regina Barzilay,et al.  Incorporating Content Structure into Text Analysis Applications , 2010, EMNLP.

[53]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[54]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[55]  Erik Cambria,et al.  Fusing audio, visual and textual clues for sentiment analysis from multimodal content , 2016, Neurocomputing.

[56]  Xiaoyan Zhu,et al.  Movie review mining and summarization , 2006, CIKM '06.

[57]  Jiebo Luo,et al.  Cross-modality Consistent Regression for Joint Visual-Textual Sentiment Analysis of Social Multimedia , 2016, WSDM.

[58]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[59]  Erik Cambria,et al.  Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis , 2015, EMNLP.

[60]  Wei Xu,et al.  Explain Images with Multimodal Recurrent Neural Networks , 2014, ArXiv.