论文信息 - Understanding and Predicting Image Memorability at a Large Scale

Understanding and Predicting Image Memorability at a Large Scale

Progress in estimating visual memorability has been limited by the small scale and lack of variety of benchmark data. Here, we introduce a novel experimental procedure to objectively measure human memory, allowing us to build LaMem, the largest annotated image memorability dataset to date (containing 60,000 images from diverse sources). Using Convolutional Neural Networks (CNNs), we show that fine-tuned deep features outperform all other features by a large margin, reaching a rank correlation of 0.64, near human consistency (0.68). Analysis of the responses of the high-level CNN layers shows which objects and regions are positively, and negatively, correlated with memorability, allowing us to create memorability maps for each image and provide a concrete method to perform image memorability manipulation. This work demonstrates that one can now robustly estimate the memorability of images from many different classes, positioning memorability and deep memorability features as prime candidates to estimate the utility of information for cognitive systems. Our model and data are available at: http://memorability.csail.mit.edu.

[1] Alexander J. Smola,et al. Support Vector Regression Machines , 1996, NIPS.

[2] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .

[3] J. Meere. The role of attention. , 2002 .

[4] Douglas DeCarlo,et al. Stylization and abstraction of photographs , 2002, ACM Trans. Graph..

[5] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[8] Aude Oliva,et al. Visual long-term memory has a massive storage capacity for object details , 2008, Proceedings of the National Academy of Sciences.

[9] Mark J. Huiskes,et al. The MIR flickr retrieval evaluation , 2008, MIR '08.

[10] Ali Farhadi,et al. Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Frédo Durand,et al. Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12] Allan Hanbury,et al. Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[13] Yihong Gong,et al. Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14] Timothy F. Brady,et al. Scene Memory Is More Detailed Than You Think : The Role of Categories in Visual Long-Term Memory , 2010 .

[15] Krista A. Ehinger,et al. SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16] Harish Katti,et al. An Eye Fixation Database for Saliency Detection in Images , 2010, ECCV.

[17] Antonio Torralba,et al. Understanding the Intrinsic Memorability of Images , 2011, NIPS.

[18] Jianxiong Xiao,et al. Memorability of Image Regions , 2012, NIPS.

[19] Jianxiong Xiao,et al. Image memorability and visual inception , 2012, SIGGRAPH Asia Technical Briefs.

[20] Naila Murray,et al. AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22] Antonio Torralba,et al. Modifying the Memorability of Face Photographs , 2013, 2013 IEEE International Conference on Computer Vision.

[23] Ali Farhadi,et al. Object-Centric Anomaly Detection by Attribute-Based Reasoning , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Matei Mancas,et al. Memorability of natural scenes: The role of attention , 2013, 2013 IEEE International Conference on Image Processing.

[25] Wilma A. Bainbridge,et al. The intrinsic memorability of face photographs. , 2013, Journal of experimental psychology. General.

[26] Aykut Erdem,et al. Visual Attention-Driven Spatial Pooling for Image Memorability , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[27] Vladimir Pavlovic,et al. Relative spatial features for image memorability , 2013, ACM Multimedia.

[28] Jianxiong Xiao,et al. What Makes a Photograph Memorable? , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[31] Trevor Darrell,et al. PANDA: Pose Aligned Networks for Deep Attribute Modeling , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[33] Raffay Hamid,et al. What makes an image popular? , 2014, WWW.

[34] Ming Yang,et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35] Trevor Darrell,et al. Fully convolutional networks for semantic segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Bolei Zhou,et al. Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[37] Bernard Ghanem,et al. What Makes an Object Memorable? , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.