Continuous Probability Distribution Prediction of Image Emotions via Multitask Shared Sparse Regression

Previous works on image emotion analysis mainly focused on predicting the dominant emotion category or the average dimension values of an image for affective image classification and regression. However, this is often insufficient in various real-world applications, as the emotions that are evoked in viewers by an image are highly subjective and different. In this paper, we propose to predict the continuous probability distribution of image emotions which are represented in dimensional valence-arousal space. We carried out large-scale statistical analysis on the constructed Image-Emotion-Social-Net dataset, on which we observed that the emotion distribution can be well-modeled by a Gaussian mixture model. This model is estimated by an expectation-maximization algorithm with specified initializations. Then, we extract commonly used emotion features at different levels for each image. Finally, we formalize the emotion distribution prediction task as a shared sparse regression (SSR) problem and extend it to multitask settings, named multitask shared sparse regression (MTSSR), to explore the latent information between different prediction tasks. SSR and MTSSR are optimized by iteratively reweighted least squares. Experiments are conducted on the Image-Emotion-Social-Net dataset with comparisons to three alternative baselines. The quantitative results demonstrate the superiority of the proposed method.

[1]  James Ze Wang,et al.  On shape and the computability of emotions , 2012, ACM Multimedia.

[2]  Yang Yang,et al.  Start from Scratch: Towards Automatically Identifying, Modeling, and Naming Visual Attributes , 2014, ACM Multimedia.

[3]  Yiran Chen,et al.  Quantitative Study of Individual Emotional States in Social Networks , 2012, IEEE Transactions on Affective Computing.

[4]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Andrea Kleinsmith,et al.  Affective Body Expression Perception and Recognition: A Survey , 2013, IEEE Transactions on Affective Computing.

[6]  Hongxun Yao,et al.  Flexible Presentation of Videos Based on Affective Content Analysis , 2013, MMM.

[7]  Amy Beth Warriner,et al.  Norms of valence, arousal, and dominance for 13,915 English lemmas , 2013, Behavior Research Methods.

[8]  Wei Zhang,et al.  Emotion based image musicalization , 2014, 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[9]  Yi-Hsuan Yang,et al.  Quantitative Study of Music Listening Behavior in a Social and Affective Context , 2013, IEEE Transactions on Multimedia.

[10]  Sam J. Maglio,et al.  Emotional category data on images from the international affective picture system , 2005, Behavior research methods.

[11]  Yongzhao Zhan,et al.  Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks , 2014, IEEE Transactions on Multimedia.

[12]  Sonja Grün,et al.  Impact of Spike Train Autostructure on Probability Distribution of Joint Spike Events , 2013, Neural Computation.

[13]  Riccardo Leonardi,et al.  A Connotative Space for Supporting Movie Affective Recommendation , 2011, IEEE Transactions on Multimedia.

[14]  Shih-Fu Chang,et al.  Designing Category-Level Attributes for Discriminative Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Juan-Zi Li,et al.  How Do Your Friends on Social Media Disclose Your Emotions? , 2014, AAAI.

[16]  Jianmin Wang,et al.  Learning Predictable and Discriminative Attributes for Visual Recognition , 2015, AAAI.

[17]  Rongrong Ji,et al.  Video indexing and recommendation based on affective analysis of viewers , 2011, MM '11.

[18]  Yun Yang,et al.  Emotionally Representative Image Discovery for Social Events , 2014, ICMR.

[19]  Hongxun Yao,et al.  Predicting discrete probability distribution of image emotions , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[20]  Jim Dowling,et al.  Predicting probability distributions for surf height using an ensemble of mixture density networks , 2005, ICML.

[21]  Yue Gao,et al.  Predicting Personalized Emotion Perceptions of Social Images , 2016, ACM Multimedia.

[22]  Mohammad Soleymani,et al.  Corpus Development for Affective Video Indexing , 2012, IEEE Transactions on Multimedia.

[23]  Jonathon S. Hare,et al.  Analyzing and predicting sentiment of images on the social web , 2010, ACM Multimedia.

[24]  Yue Gao,et al.  Multimedia Social Event Detection in Microblog , 2015, MMM.

[25]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[26]  Shannon L. Risacher,et al.  Sparse multi-task regression and feature selection to identify brain imaging predictors for memory performance , 2011, 2011 International Conference on Computer Vision.

[27]  Wotao Yin,et al.  Iteratively reweighted algorithms for compressive sensing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Harish Katti,et al.  CAVVA: Computational Affective Video-in-Video Advertising , 2014, IEEE Transactions on Multimedia.

[29]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[30]  Jieping Ye,et al.  Robust multi-task feature learning , 2012, KDD.

[31]  Lijun Yin,et al.  Static and dynamic 3D facial expression recognition: A comprehensive survey , 2012, Image Vis. Comput..

[32]  Yue Gao,et al.  Exploring Principles-of-Art Features For Image Emotion Recognition , 2014, ACM Multimedia.

[33]  Qiang Ji,et al.  Video Affective Content Analysis: A Survey of State-of-the-Art Methods , 2015, IEEE Transactions on Affective Computing.

[34]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Thomas Lengauer,et al.  Multi-task learning for HIV therapy screening , 2008, ICML '08.

[36]  Jie Tang,et al.  Can we understand van gogh's mood?: learning to infer affects from images in social networks , 2012, ACM Multimedia.

[37]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[38]  Homer H. Chen,et al.  Emotional Accompaniment Generation System Based on Harmonic Progression , 2013, IEEE Transactions on Multimedia.

[39]  Yun Yang,et al.  User interest and social influence based emotion prediction for individuals , 2013, ACM Multimedia.

[40]  Qingshan Liu,et al.  Exploring facial expressions with compositional features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[42]  Tao Chen,et al.  Object-Based Visual Sentiment Concept Analysis and Application , 2014, ACM Multimedia.

[43]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Aurobinda Routray,et al.  Automatic facial expression recognition using features of salient facial patches , 2015, IEEE Transactions on Affective Computing.

[45]  Jiebo Luo,et al.  Sentribute: image sentiment analysis from a mid-level perspective , 2013, WISDOM '13.

[46]  A. Hanjalic,et al.  Extracting moods from pictures and sounds: towards truly personalized TV , 2006, IEEE Signal Processing Magazine.

[47]  H. Schlosberg Three dimensions of emotion. , 1954, Psychological review.

[48]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[49]  Junzhou Huang,et al.  Preconditioning for Accelerated Iteratively Reweighted Least Squares in Structured Sparsity Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[51]  Søren Holdt Jensen,et al.  Using Audio-Derived Affective Offset to Enhance TV Recommendation , 2014, IEEE Transactions on Multimedia.

[52]  Alan Hanjalic,et al.  Affective video content representation and modeling , 2005, IEEE Transactions on Multimedia.

[53]  Tsuhan Chen,et al.  A mixed bag of emotions: Model, predict, and transfer emotion distributions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Hongxun Yao,et al.  Predicting Continuous Probability Distribution of Image Emotions in Valence-Arousal Space , 2015, ACM Multimedia.

[55]  Chung-Hsien Wu,et al.  Speaking Effect Removal on Emotion Recognition From Facial Expressions Based on Eigenface Conversion , 2013, IEEE Transactions on Multimedia.

[56]  Nicu Sebe,et al.  Emotional valence categorization using holistic image features , 2008, 2008 15th IEEE International Conference on Image Processing.

[57]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[58]  Zhuowen Tu,et al.  Detecting Object Boundaries Using Low-, Mid-, and High-level Information , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Yu Ying-lin,et al.  Image Retrieval by Emotional Semantics: A Study of Emotional Space and Feature Extraction , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[60]  Hui Tian,et al.  Cumulative Probability Distribution Model for Evaluating User Behavior Prediction Algorithms , 2013, 2013 International Conference on Social Computing.

[61]  James Hays,et al.  SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[63]  Jiebo Luo,et al.  Aesthetics and Emotions in Images , 2011, IEEE Signal Processing Magazine.

[64]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech: a review , 2012, International Journal of Speech Technology.

[65]  Hongxun Yao,et al.  Video classification and recommendation based on affective analysis of viewers , 2013, Neurocomputing.

[66]  K. Scherer What are emotions? And how can they be measured? , 2005 .

[67]  Jurij F. Tasic,et al.  Affective Labeling in a Content-Based Recommender System for Images , 2013, IEEE Transactions on Multimedia.

[68]  Hongxun Yao,et al.  Affective Image Retrieval via Multi-Graph Learning , 2014, ACM Multimedia.

[69]  Bing Li,et al.  Context-aware affective images classification based on bilayer sparse representation , 2012, ACM Multimedia.

[70]  Yi-Hsuan Yang,et al.  Machine Recognition of Music Emotion: A Review , 2012, TIST.