Visual Attention-Driven Spatial Pooling for Image Memorability

In daily life, humans demonstrate astounding ability to remember images they see on magazines, commercials, TV, the web and so on, but automatic prediction of intrinsic memorability of images using computer vision and machine learning techniques was not investigated until a few years ago. However, despite these recent advances, none of the available approaches makes use of any attentional mechanism, a fundamental aspect of human vision, which selects relevant image regions for higher-level processing. Our goal in this paper is to explore the role of visual attention in understanding memorability of images. In particular, we present an attention-driven spatial pooling strategy for image memorability and show that the regions estimated by bottom-up and object-level saliency maps are more effective in predicting memorability than considering a fixed spatial pyramid structure as in the previous studies.

[1]  Laurent Itti,et al.  Interesting objects are visually salient. , 2008, Journal of vision.

[2]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[3]  Yuning Jiang,et al.  Randomized Spatial Partition for Scene Recognition , 2012, ECCV.

[4]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  J. Henderson,et al.  Accurate visual memory for previously attended objects in natural scenes , 2002 .

[6]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[7]  Ming-Hsuan Yang,et al.  Top-down visual saliency via joint CRF and dictionary learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Kazuya Inoue,et al.  The role of attention in the contextual enhancement of visual memory for natural scenes , 2012 .

[9]  Aude Oliva,et al.  Visual long-term memory has a massive storage capacity for object details , 2008, Proceedings of the National Academy of Sciences.

[10]  Christof Koch,et al.  Learning visual saliency by combining feature maps in a nonlinear manner using AdaBoost. , 2012, Journal of vision.

[11]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  A. Oliva,et al.  From Blobs to Boundary Edges: Evidence for Time- and Spatial-Scale-Dependent Scene Recognition , 1994 .

[13]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Carrick C. Williams,et al.  To see and remember: Visually specific information is retained in memory from previously attended objects in natural scenes , 2001, Psychonomic bulletin & review.

[15]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  L. Standing Learning 10000 pictures , 1973 .

[17]  Antonio Torralba,et al.  Understanding the Intrinsic Memorability of Images , 2011, NIPS.

[18]  Trevor Darrell,et al.  Beyond spatial pyramids: Receptive field learning for pooled image features , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  George A. Alvarez,et al.  Natural-Scene Perception Requires Attention , 2011, Psychological science.

[20]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Aykut Erdem,et al.  Visual saliency estimation by nonlinearly integrating features using region covariances. , 2013, Journal of vision.

[22]  Fei-Fei Li,et al.  Object-Centric Spatial Pooling for Image Classification , 2012, ECCV.

[23]  J. Wolfe,et al.  Is visual attention required for robust picture memory? , 2007, Vision Research.

[24]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Jianxiong Xiao,et al.  What makes an image memorable? , 2011, CVPR 2011.

[27]  J. Wolfe Visual memory: What do you know about what you saw? , 1998, Current Biology.

[28]  R. Shepard Recognition memory for words, sentences, and pictures , 1967 .

[29]  L. Standing Learning 10,000 pictures. , 1973, The Quarterly journal of experimental psychology.

[30]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[31]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[32]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[33]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[34]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[35]  Barbara Caputo,et al.  Indoor Scene Recognition using Task and Saliency-driven Feature Pooling , 2012, BMVC.

[36]  Jianxiong Xiao,et al.  Memorability of Image Regions , 2012, NIPS.

[37]  M. Potter Short-term conceptual memory for pictures. , 1976, Journal of experimental psychology. Human learning and memory.

[38]  Yasuo Kuniyoshi,et al.  Discriminative spatial pyramid , 2011, CVPR 2011.

[39]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.