Predicting memorability of images using attention-driven spatial pooling and image semantics

In daily life, humans demonstrate an amazing ability to remember images they see on magazines, commercials, TV, web pages, etc. but automatic prediction of intrinsic memorability of images using computer vision and machine learning techniques has only been investigated very recently. Our goal in this article is to explore the role of visual attention and image semantics in understanding image memorability. In particular, we present an attention-driven spatial pooling strategy and show that considering image features from the salient parts of images improves the results of the previous models. We also investigate different semantic properties of images by carrying out an analysis of a diverse set of recently proposed semantic features which encode meta-level object categories, scene attributes, and invoked feelings. We show that these features which are automatically extracted from images provide memorability predictions as nearly accurate as those derived from human annotations. Moreover, our combined model yields results superior to those of state-of-the art fully automatic models. We examine the role of visual attention and image semantics in understanding image memorability.We propose an attention-driven spatial pooling strategy for image memorability.Considering image features from the salient parts of images improves the results of the previous models.We also investigate different semantic properties of images.Combining attention-driven pooling with semantic features yields state-of-the-art results.

[1]  Kerry Hourigan,et al.  Wake transition of a rolling sphere , 2011, J. Vis..

[2]  Jianxiong Xiao,et al.  What Makes a Photograph Memorable? , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[4]  L. Standing Learning 10,000 pictures. , 1973, The Quarterly journal of experimental psychology.

[5]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  R. Plutchik Emotion, a psychoevolutionary synthesis , 1980 .

[7]  A. Oliva,et al.  From Blobs to Boundary Edges: Evidence for Time- and Spatial-Scale-Dependent Scene Recognition , 1994 .

[8]  Jianxiong Xiao,et al.  What makes an image memorable , 2011 .

[9]  K. Fujii,et al.  Visualization for the analysis of fluid motion , 2005, J. Vis..

[10]  Antonio Torralba,et al.  Understanding the Intrinsic Memorability of Images , 2011, NIPS.

[11]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[12]  Lorenzo Torresani,et al.  Meta-class features for large-scale object categorization on a budget , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[15]  Shuo Wang,et al.  Predicting human gaze beyond pixels. , 2014, Journal of vision.

[16]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[18]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Aykut Erdem,et al.  Visual Attention-Driven Spatial Pooling for Image Memorability , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[20]  Laurent Itti,et al.  Interesting objects are visually salient. , 2008, Journal of vision.

[21]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[22]  Masaaki Kawahashi,et al.  Renovation of Journal of Visualization , 2010, J. Vis..

[23]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Jianxiong Xiao,et al.  Memorability of Image Regions , 2012, NIPS.

[25]  Masahiro Takei,et al.  Human resource development and visualization , 2009, J. Vis..

[26]  Matei Mancas,et al.  Memorability of natural scenes: The role of attention , 2013, 2013 IEEE International Conference on Image Processing.

[27]  Rama Chellappa,et al.  Guest Editor’s Introduction to the Special Issue on Domain Adaptation for Vision Applications , 2014, International Journal of Computer Vision.

[28]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[29]  M. Potter Short-term conceptual memory for pictures. , 1976, Journal of experimental psychology. Human learning and memory.

[30]  Chen Xu,et al.  The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding , 2014, International Journal of Computer Vision.

[31]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[32]  R. Shepard Recognition memory for words, sentences, and pictures , 1967 .

[33]  Ming-Hsuan Yang,et al.  Top-down visual saliency via joint CRF and dictionary learning , 2012, CVPR.

[34]  Andrew W. Fitzgibbon,et al.  PiCoDes: Learning a Compact Code for Novel-Category Recognition , 2011, NIPS.

[35]  J. Wolfe,et al.  Is visual attention required for robust picture memory? , 2007, Vision Research.

[36]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[37]  Kazuya Inoue,et al.  The role of attention in the contextual enhancement of visual memory for natural scenes , 2012 .

[38]  Aude Oliva,et al.  Visual long-term memory has a massive storage capacity for object details , 2008, Proceedings of the National Academy of Sciences.

[39]  J. Wolfe Visual memory: What do you know about what you saw? , 1998, Current Biology.

[40]  Aykut Erdem,et al.  Top down saliency estimation via superpixel-based discriminative dictionaries , 2014, BMVC.

[41]  Christof Koch,et al.  Learning visual saliency by combining feature maps in a nonlinear manner using AdaBoost. , 2012, Journal of vision.

[42]  Vladimir Pavlovic,et al.  Relative spatial features for image memorability , 2013, ACM Multimedia.

[43]  Aykut Erdem,et al.  Visual saliency estimation by nonlinearly integrating features using region covariances. , 2013, Journal of vision.

[44]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Carrick C. Williams,et al.  To see and remember: Visually specific information is retained in memory from previously attended objects in natural scenes , 2001, Psychonomic bulletin & review.

[47]  George A. Alvarez,et al.  Natural-Scene Perception Requires Attention , 2011, Psychological science.

[48]  J. Henderson,et al.  Accurate visual memory for previously attended objects in natural scenes , 2002 .