Ensemble learning on visual and textual data for social image emotion classification

Texts, images and other information are posted everyday on the social network and provides a large amount of multimodal data. The aim of this work is to investigate if combining and integrating both visual and textual data permits to identify emotions elicited by an image. We focus on image emotion classification within eight emotion categories: amusement, awe, contentment, excitement, anger, disgust, fear and sadness. Within this classification task we here propose to adopt ensemble learning approaches based on the Bayesian model averaging method, that combine five state-of-the-art classifiers. The proposed ensemble approaches consider predictions given by several classification models, based on visual and textual data, through respectively a late and an early fusion schemes. Our investigations show that an ensemble method based on a late fusion of unimodal classifiers permits to achieve high classification performance within all of the eight emotion classes. The improvement is higher when deep image representations are adopted as visual features, compared with hand-crafted ones.

[1]  Raimondo Schettini,et al.  Recall or precision-oriented strategies for binary classification of skin pixels , 2008, J. Electronic Imaging.

[2]  Q. M. Jonathan Wu,et al.  3D Shape from Focus and Depth Map Computation Using Steerable Filters , 2009, ICIAR.

[3]  Abdulmotaleb El-Saddik,et al.  Sentiment Analysis on Multi-View Social Data , 2016, MMM.

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  Dong Liu,et al.  Towards a comprehensive computational model foraesthetic assessment of videos , 2013, MM '13.

[6]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[7]  Bin Gu,et al.  A Robust Regularization Path Algorithm for $\nu $ -Support Vector Classification , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Gianluigi Ciocca,et al.  Predicting Complexity Perception of Real World Images , 2016, PloS one.

[9]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[10]  K. Scherer What are emotions? And how can they be measured? , 2005 .

[11]  Jiebo Luo,et al.  Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark , 2016, AAAI.

[12]  K. Scherer,et al.  The Geneva affective picture database (GAPED): a new 730-picture database focusing on valence and normative significance , 2011, Behavior research methods.

[13]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[14]  Yue Gao,et al.  Exploring Principles-of-Art Features For Image Emotion Recognition , 2014, ACM Multimedia.

[15]  P. Ekman An argument for basic emotions , 1992 .

[16]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[17]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[18]  Li Chen,et al.  News impact on stock price return via sentiment analysis , 2014, Knowl. Based Syst..

[19]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[21]  Yue Gao,et al.  Continuous Probability Distribution Prediction of Image Emotions via Multitask Shared Sparse Regression , 2017, IEEE Transactions on Multimedia.

[22]  J. Russell A circumplex model of affect. , 1980 .

[23]  Gianluigi Ciocca,et al.  Genetic programming approach to evaluate complexity of texture images , 2016, J. Electronic Imaging.

[24]  Jan P. Allebach,et al.  Learning deep features for image emotion classification , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[25]  Hideyuki Tamura,et al.  Textural Features Corresponding to Visual Perception , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[26]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[28]  Yuanzhen Li,et al.  Measuring visual clutter. , 2007, Journal of vision.

[29]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[30]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[31]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[32]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[33]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  John Immerkær,et al.  Fast Noise Variance Estimation , 1996, Comput. Vis. Image Underst..

[35]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[37]  Sabine Süsstrunk,et al.  Measuring colorfulness in natural images , 2003, IS&T/SPIE Electronic Imaging.

[38]  Xingming Sun,et al.  Fast Motion Estimation Based on Content Property for Low-Complexity H.265/HEVC Encoder , 2016, IEEE Transactions on Broadcasting.

[39]  Rosalind W. Picard Affective Computing for HCI , 1999, HCI.

[40]  Daling Wang,et al.  Multimodal Data Fusion in Text-Image Heterogeneous Graph for Social Media Recommendation , 2014, WAIM.

[41]  Stefan Winkler,et al.  A no-reference perceptual blur metric , 2002, Proceedings. International Conference on Image Processing.

[42]  Jiebo Luo,et al.  Aesthetics and Emotions in Images , 2011, IEEE Signal Processing Magazine.

[43]  Wontae Kim,et al.  High-definition PIV analysis on vortex shedding in the cylinder wake , 2004, J. Vis..

[44]  Min Xu,et al.  Learning Multi-level Deep Representations for Image Emotion Classification , 2016, Neural Processing Letters.

[45]  Raimondo Schettini,et al.  Contrast image correction method , 2010, J. Electronic Imaging.

[46]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[47]  Mohan S. Kankanhalli,et al.  Goal-oriented optimal subset selection of correlated multimedia streams , 2007, TOMCCAP.

[48]  Sam J. Maglio,et al.  Emotional category data on images from the international affective picture system , 2005, Behavior research methods.

[49]  Sam Kwong,et al.  Efficient Motion and Disparity Estimation Optimization for Low Complexity Multiview Video Coding , 2015, IEEE Transactions on Broadcasting.

[50]  Alan C. Bovik,et al.  Making a “Completely Blind” Image Quality Analyzer , 2013, IEEE Signal Processing Letters.

[51]  Elisabetta Fersini,et al.  Sentiment Analysis in Social Networks , 2016 .

[52]  Elisabetta Fersini,et al.  Sentiment analysis: Bayesian Ensemble Learning , 2014, Decis. Support Syst..

[53]  Jun Li,et al.  Social emotion classification of short text via topic-level maximum entropy model , 2016, Inf. Manag..

[54]  Jiebo Luo,et al.  Cross-modality Consistent Regression for Joint Visual-Textual Sentiment Analysis of Social Multimedia , 2016, WSDM.

[55]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[56]  Raymond Y. K. Lau,et al.  Generating Incidental Word-Learning Tasks via Topic-Based and Load-Based Profiles , 2016, IEEE MultiMedia.

[57]  Urbano Nunes,et al.  Trainable classifier-fusion schemes: An application to pedestrian detection , 2009, 2009 12th International IEEE Conference on Intelligent Transportation Systems.

[58]  Erik Cambria,et al.  A review of affective computing: From unimodal analysis to multimodal fusion , 2017, Inf. Fusion.

[59]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[60]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.