Understanding and Predicting the Memorability of Outdoor Natural Scenes

Memorability measures how easily an image is to be memorized after glancing, which may contribute to designing magazine covers, tourism publicity materials, and so forth. Recent works have shed light on the visual features that make generic images, object images or face photographs memorable. However, these methods are not able to effectively predict the memorability of outdoor natural scene images. To overcome this shortcoming of previous works, in this paper, we provide an attempt to answer: “what exactly makes outdoor natural scenes memorable”. To this end, we first establish a large-scale outdoor natural scene image memorability (LNSIM) database, containing 2,632 outdoor natural scene images with their ground truth memorability scores and the multi-label scene category annotations. Then, similar to previous works, we mine our database to investigate how low-, middle- and high-level handcrafted features affect the memorability of outdoor natural scenes. In particular, we find that the high-level feature of scene category is rather correlated with outdoor natural scene memorability, and the deep features learnt by deep neural network (DNN) are also effective in predicting the memorability scores. Moreover, combining the deep features with the category feature can further boost the performance of memorability prediction. Therefore, we propose an end-to-end DNN based outdoor natural scene memorability (DeepNSM) predictor, which takes advantage of the learned category-related features. Then, the experimental results validate the effectiveness of our DeepNSM model, exceeding the state-of-the-art methods. Finally, we try to understand the reason of the good performance for our DeepNSM model, and also study the cases that our DeepNSM model succeeds or fails to accurately predict the memorability of outdoor natural scenes.

[1]  L. Standing Learning 10000 pictures , 1973 .

[2]  Antonio Torralba,et al.  Understanding the Intrinsic Memorability of Images , 2011, NIPS.

[3]  Antonio Torralba,et al.  Modifying the Memorability of Face Photographs , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Zulin Wang,et al.  Reducing Complexity of HEVC: A Deep Learning Approach , 2017, IEEE Transactions on Image Processing.

[5]  Patrick Le Callet,et al.  Deep Learning for Image Memorability Prediction: the Emotional Bias , 2016, ACM Multimedia.

[6]  Andrew F. Monk,et al.  Memorability, Word Frequency and Negative Recognition* , 1977 .

[7]  Bernard Ghanem,et al.  What Makes an Object Memorable? , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Ling Shao,et al.  Learning Computational Models of Video Memorability from fMRI Brain Imaging , 2015, IEEE Transactions on Cybernetics.

[9]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[10]  Jianxiong Xiao,et al.  Memorability of Image Regions , 2012, NIPS.

[11]  Jianxiong Xiao,et al.  What Makes a Photograph Memorable? , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Antonio Torralba,et al.  Understanding and Predicting Image Memorability at a Large Scale , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[15]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[16]  Haibin Ling,et al.  Revisiting Video Saliency Prediction in the Deep Learning Era , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Jianxiong Xiao,et al.  What makes an image memorable? , 2011, CVPR 2011.

[18]  Wenguan Wang,et al.  Deep Cropping via Attention Box Prediction and Aesthetics Assessment , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Liming Zhang,et al.  A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression , 2010, IEEE Transactions on Image Processing.

[20]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[23]  Haibin Ling,et al.  A Deep Network Solution for Attention and Aesthetics Aware Photo Cropping , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[25]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Naila Murray,et al.  AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Nicu Sebe,et al.  Increasing Image Memorability with Neural Style Transfer , 2019, ACM Trans. Multim. Comput. Commun. Appl..

[28]  Zulin Wang,et al.  What Makes Natural Scene Memorable? , 2018, EE-USAD'18.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Aykut Erdem,et al.  Visual Attention-Driven Spatial Pooling for Image Memorability , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[31]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[32]  Junting Pan,et al.  SalGAN: visual saliency prediction with adversarial networks , 2017 .

[33]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[34]  Harish Katti,et al.  An Eye Fixation Database for Saliency Detection in Images , 2010, ECCV.

[35]  Ruigang Yang,et al.  Inferring Salient Objects from Human Fixations , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Qi Zhao,et al.  SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  A. Torralba,et al.  Intrinsic and extrinsic effects on image memorability , 2015, Vision Research.

[38]  Aude Oliva,et al.  Establishing a Database for Studying Human Face Photograph Memory , 2012, CogSci.

[39]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[40]  S. Vogt,et al.  Long-term memory for 400 pictures on a common theme. , 2007, Experimental psychology.

[41]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[42]  Zulin Wang,et al.  Predicting Head Movement in Panoramic Video: A Deep Reinforcement Learning Approach , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jianxiong Xiao,et al.  Image memorability and visual inception , 2012, SIGGRAPH Asia Technical Briefs.

[45]  Haibin Ling,et al.  Salient Object Detection in the Deep Learning Era: An In-Depth Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[48]  Wilma A. Bainbridge,et al.  The intrinsic memorability of face photographs. , 2013, Journal of experimental psychology. General.

[49]  Matei Mancas,et al.  Memorability of natural scenes: The role of attention , 2013, 2013 IEEE International Conference on Image Processing.

[50]  Wenguan Wang,et al.  Deep Visual Attention Prediction , 2017, IEEE Transactions on Image Processing.

[51]  Zulin Wang,et al.  Predicting the memorability of natural-scene images , 2016, 2016 Visual Communications and Image Processing (VCIP).