Visual affective classification by combining visual and text features

Affective analysis of images in social networks has drawn much attention, and the texts surrounding images are proven to provide valuable semantic meanings about image content, which can hardly be represented by low-level visual features. In this paper, we propose a novel approach for visual affective classification (VAC) task. This approach combines visual representations along with novel text features through a fusion scheme based on Dempster-Shafer (D-S) Evidence Theory. Specifically, we not only investigate different types of visual features and fusion methods for VAC, but also propose textual features to effectively capture emotional semantics from the short text associated to images based on word similarity. Experiments are conducted on three public available databases: the International Affective Picture System (IAPS), the Artistic Photos and the MirFlickr Affect set. The results demonstrate that the proposed approach combining visual and textual features provides promising results for VAC task.

[1]  Sam J. Maglio,et al.  Emotional category data on images from the international affective picture system , 2005, Behavior research methods.

[2]  Tsuhan Chen,et al.  A mixed bag of emotions: Model, predict, and transfer emotion distributions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  James Ze Wang,et al.  Content-based image retrieval: approaches and trends of the new age , 2005, MIR '05.

[4]  Qianhua He,et al.  A survey on emotional semantic image retrieval , 2008, 2008 15th IEEE International Conference on Image Processing.

[5]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[6]  Yunhong Wang,et al.  BUAA-iCC at ImageCLEF 2015 Scalable Concept Image Annotation Challenge , 2015, CLEF.

[7]  J. M. Kittross The measurement of meaning , 1959 .

[8]  Ansgar Feist,et al.  Entwicklung eines Verfahrens zur Erfassung des Gefühlszustandes (VGZ) , 2007 .

[9]  Honglak Lee,et al.  Improved Multimodal Deep Learning with Variation of Information , 2014, NIPS.

[10]  Gang Wang,et al.  Building text features for object image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Mario Fritz,et al.  Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[13]  De Xu,et al.  Emotion categorization using affective-pLSA model , 2010 .

[14]  Yan Ke,et al.  The Design of High-Level Features for Photo Quality Assessment , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  M. Lorr,et al.  Two EITS Manual for the Profile of Mood States (1971 & 1992) , 1971 .

[16]  Gabriela Csurka,et al.  LEAR and XRCE's Participation to Visual Concept Detection Task - ImageCLEF 2010 , 2010, CLEF.

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Emmanuel Dellandréa,et al.  Multimodal recognition of visual concepts using histograms of textual concepts and selective weighted late fusion scheme , 2013, Comput. Vis. Image Underst..

[19]  Jonathan G. Fiscus,et al.  TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking , 2016, TRECVID.

[20]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[21]  Nicu Sebe,et al.  Emotional valence categorization using holistic image features , 2008, 2008 15th IEEE International Conference on Image Processing.

[22]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[23]  Reiner Lenz,et al.  Emotion related structures in large image databases , 2010, CIVR '10.

[24]  Erik Cambria,et al.  Fusing audio, visual and textual clues for sentiment analysis from multimodal content , 2016, Neurocomputing.

[25]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Michael Sussna,et al.  Word sense disambiguation for free-text indexing using a massive semantic network , 1993, CIKM '93.

[27]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[28]  Mohamed A. Deriche,et al.  A New Technique for Combining Multiple Classifiers using The Dempster-Shafer Theory of Evidence , 2002, J. Artif. Intell. Res..

[29]  Mario Fritz,et al.  A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.

[30]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[31]  Saif Mohammad,et al.  SemEval-2012 Task 2: Measuring Degrees of Relational Similarity , 2012, *SEMEVAL.

[32]  Ricardo Matsumura de Araújo,et al.  On the Performance of GoogLeNet and AlexNet Applied to Sketches , 2016, AAAI.

[33]  Bo Wu,et al.  Real time facial expression recognition with AdaBoost , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[34]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[35]  Yang Wang,et al.  Enforcing convexity for improved alignment with constrained local models , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  M. Bradley,et al.  Measuring emotion: the Self-Assessment Manikin and the Semantic Differential. , 1994, Journal of behavior therapy and experimental psychiatry.

[37]  Yu Ying-lin,et al.  Image Retrieval by Emotional Semantics: A Study of Emotional Space and Feature Extraction , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[38]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[39]  Wei Liu,et al.  Multimodal Emotion Recognition Using Multimodal Deep Learning , 2016, ArXiv.

[40]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[41]  Liming Chen,et al.  Line segment based edge feature using Hough transform , 2007 .

[42]  Gang Wang,et al.  Object image retrieval by exploiting online knowledge resources , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Yimo Guo,et al.  Emotion Recognition System in Images Based On Fuzzy Neural Network and HMM , 2006, 2006 5th IEEE International Conference on Cognitive Informatics.

[44]  Motoaki Kawanabe,et al.  The Joint Submission of the TU Berlin and Fraunhofer FIRST (TUBFI) to the ImageCLEF2011 Photo Annotation Task , 2011, CLEF.

[45]  Geoffrey Zweig,et al.  Combining Heterogeneous Models for Measuring Relational Similarity , 2013, NAACL.

[46]  Emmanuel Dellandréa,et al.  Associating Textual Features with Visual Ones to Improve Affective Image Classification , 2011, ACII.

[47]  K. Hevner Experimental studies of the elements of expression in music , 1936 .

[48]  Bing Li,et al.  Context-aware affective images classification based on bilayer sparse representation , 2012, ACM Multimedia.

[49]  P. Valdez,et al.  Effects of color on emotions. , 1994, Journal of experimental psychology. General.

[50]  Jonathan Krause,et al.  Fine-Grained Crowdsourcing for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[52]  M.,et al.  Statistical and Structural Approaches to Texture , 2022 .

[53]  Mike Thelwall,et al.  Sentiment in short strength detection informal text , 2010 .

[54]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[55]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[56]  Jiebo Luo,et al.  A Multifaceted Approach to Social Multimedia-Based Prediction of Elections , 2015, IEEE Transactions on Multimedia.

[57]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[58]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Emmanuel Dellandréa,et al.  Evaluation of Features and Combination Approaches for the Classification of Emotional Semantics in Images , 2011, VISAPP.

[60]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[61]  M. Thelwall,et al.  Sentiment Strength Detection in Short Informal Text 1 , 2010 .

[62]  Stefanie Nowak,et al.  Content-based mood classification for photos and music: a generic multi-modal classification framework and evaluation approach , 2008, MIR '08.

[63]  James Ze Wang,et al.  Studying Aesthetics in Photographic Images Using a Computational Approach , 2006, ECCV.

[64]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[65]  Peter Wiemer-Hastings,et al.  Latent semantic analysis , 2004, Annu. Rev. Inf. Sci. Technol..

[66]  Mario Fritz,et al.  Hard to Cheat: A Turing Test based on Answering Questions about Images , 2015, AAAI 2015.

[67]  Shengming Jiang,et al.  Image Retrieval by Emotional Semantics: A Study of Emotional Space and Feature Extraction , 2006, SMC.

[68]  Nicu Sebe,et al.  Recognizing Emotions from Abstract Paintings Using Non-Linear Matrix Completion , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Emmanuel Dellandréa,et al.  LIRIS-Imagine at ImageCLEF 2011 Photo Annotation Task , 2011, CLEF.

[70]  K. Scherer,et al.  Emotion recognition from expressions in face, voice, and body: the Multimodal Emotion Recognition Test (MERT). , 2009, Emotion.

[71]  M. Bradley,et al.  Affective Norms for English Words (ANEW): Instruction Manual and Affective Ratings , 1999 .

[72]  Emmanuel Dellandréa,et al.  Classification of affective semantics in images based on discrete and dimensional models of emotions , 2010, 2010 International Workshop on Content Based Multimedia Indexing (CBMI).

[73]  J. Itten The art of color : the subjective experience and objective rationale of color , 1973 .

[74]  Alberto Del Bimbo,et al.  Semantics in Visual Information Retrieval , 1999, IEEE Multim..

[75]  P. Lang International Affective Picture System (IAPS) : Technical Manual and Affective Ratings , 1995 .

[76]  Cordelia Schmid,et al.  Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[77]  Hideyuki Tamura,et al.  Textural Features Corresponding to Visual Perception , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[78]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[79]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[80]  Honglak Lee,et al.  Deep learning for robust feature generation in audiovisual emotion recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[81]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[82]  Gary R. Bradski,et al.  A codebook-free and annotation-free approach for fine-grained image categorization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[83]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[84]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[85]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[86]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[87]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..