Image Memorability Using Diverse Visual Features and Soft Attention

In this paper we present a method for still image memorability estimation. The proposed solution exploits feature maps extracted from two Convolutional Neural Networks pre-trained for object recognition and memorability estimation respectively. The feature maps are then enhanced using a soft attention mechanism in order to let the model focus on highly informative image regions for memorability estimation. Results achieved on a benchmark dataset demonstrate the effectiveness of the proposed method.

[1]  Claire-Hélène Demarty,et al.  Deep Learning for Predicting Image Memorability , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[4]  Paolo Remagnino,et al.  AMNet: Memorability Estimation with Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  A. Torralba,et al.  Intrinsic and extrinsic effects on image memorability , 2015, Vision Research.

[6]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[7]  Raffay Hamid,et al.  What makes an image popular? , 2014, WWW.

[8]  Alessandro Rozza,et al.  Learning Combinations of Activation Functions , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[9]  Ruslan Salakhutdinov,et al.  Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.

[10]  Antonio Torralba,et al.  Understanding and Predicting Image Memorability at a Large Scale , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Antonio Torralba,et al.  Understanding the Intrinsic Memorability of Images , 2011, NIPS.

[12]  J. D. McGaugh,et al.  A Novel Demonstration of Enhanced Memory Associated with Emotional Arousal , 1995, Consciousness and Cognition.

[13]  Nicu Sebe,et al.  How to Make an Image More Memorable?: A Deep Style Transfer Approach , 2017, ICMR.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[17]  Matei Mancas,et al.  Memorability of natural scenes: The role of attention , 2013, 2013 IEEE International Conference on Image Processing.

[18]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[19]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Raimondo Schettini,et al.  Multiscale fully convolutional network for image saliency , 2018, J. Electronic Imaging.

[21]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[23]  Timothy F. Brady,et al.  Scene Memory Is More Detailed Than You Think : The Role of Categories in Visual Long-Term Memory , 2010 .

[24]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[25]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  W. Pirie Spearman Rank Correlation Coefficient , 2006 .

[27]  Jianxiong Xiao,et al.  What makes an image memorable? , 2011, CVPR 2011.

[28]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.