Multimodal Classification for Analysing Social Media

Classification of social media data is an important approach in understanding user behavior on the Web. Although information on social media can be of different modalities such as texts, images, audio or videos, traditional approaches in classification usually leverage only one prominent modality. Techniques that are able to leverage multiple modalities are often complex and susceptible to the absence of some modalities. In this paper, we present simple models that combine information from different modalities to classify social media content and are able to handle the above problems with existing techniques. Our models combine information from different modalities using a pooling layer and an auxiliary learning task is used to learn a common feature space. We demonstrate the performance of our models and their robustness to the missing of some modalities in the emotion classification domain. Our approaches, although being simple, can not only achieve significantly higher accuracies than traditional fusion approaches but also have comparable results when only one modality is available.

[1]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[2]  Jan P. Allebach,et al.  Learning deep features for image emotion classification , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[3]  Karl Aberer,et al.  Result selection and summarization for Web Table search , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[4]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[5]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Luca Maria Gambardella,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Flexible, High Performance Convolutional Neural Networks for Image Classification , 2022 .

[8]  Jiebo Luo,et al.  Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark , 2016, AAAI.

[9]  Claire Lutkewitte,et al.  Multimodal Composition: A Critical Sourcebook , 2013 .

[10]  Matthias Weidlich,et al.  Retaining Data from Streams of Social Platforms with Minimal Regret , 2017, IJCAI.

[11]  Jie Tang,et al.  Can we understand van gogh's mood?: learning to infer affects from images in social networks , 2012, ACM Multimedia.

[12]  Karl Aberer,et al.  Minimizing Efforts in Validating Crowd Answers , 2015, SIGMOD Conference.

[13]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[14]  Jiebo Luo,et al.  Robust Visual-Textual Sentiment Analysis: When Attention meets Tree-structured Recursive Neural Networks , 2016, ACM Multimedia.

[15]  Karl Aberer,et al.  Argument discovery via crowdsourcing , 2017, The VLDB Journal.

[16]  Daniel Schopfhauser,et al.  Multimodal classification of events in social media , 2016, Image Vis. Comput..

[17]  Kiyoharu Aizawa,et al.  Affective Audio-Visual Words and Latent Topic Driving Model for Realizing Movie Affective Scene Classification , 2010, IEEE Transactions on Multimedia.

[18]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[19]  R. Plutchik Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice , 2016 .

[20]  Yue Gao,et al.  Exploring Principles-of-Art Features For Image Emotion Recognition , 2014, ACM Multimedia.

[21]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[22]  Nir Ailon,et al.  Deep unsupervised learning through spatial contrasting , 2016, ArXiv.

[23]  Josef Kittler,et al.  Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  James Ze Wang,et al.  On shape and the computability of emotions , 2012, ACM Multimedia.

[25]  A. Murat Tekalp,et al.  Audiovisual Synchronization and Fusion Using Canonical Correlation Analysis , 2007, IEEE Transactions on Multimedia.

[26]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[27]  Jiebo Luo,et al.  Cross-modality Consistent Regression for Joint Visual-Textual Sentiment Analysis of Social Multimedia , 2016, WSDM.

[28]  Anthony K. H. Tung,et al.  Multiple feature fusion for social media applications , 2010, SIGMOD Conference.

[29]  Rongrong Ji,et al.  SentiBank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content , 2013, ACM Multimedia.