Predicting Image Memorability Through Adaptive Transfer Learning From External Sources

Remembering images is an innate human capability. Camera images are captured by different people under varying environmental conditions, which leads to highly diverse image memorability scores. However, the factors that make an image more or less memorable are unclear, and it remains unknown how we can more accurately predict image memorability by using such factors. In this paper, we propose a novel framework called multiview transfer learning from external sources (MTLES) to predict image memorability. In this framework, we simultaneously leverage different types of visual feature sets and multiple types of predefined image attributes derived from external sources. In particular, to enhance representation ability of visual features, we construct connections between visual feature sets and higher level image attributes by transferring attribute knowledge from external sources. MTLES integrates weak learning through external sources, transfer learning, and multiview consistency loss with different types of feature sets into a joint framework. To better solve this joint optimization problem, we further develop an alternating iterative algorithm to deal with it. Experiments performed on the publicly available LaMem dataset demonstrate the effectiveness of the proposed scheme.

[1]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[2]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[3]  A. Torralba,et al.  Intrinsic and extrinsic effects on image memorability , 2015, Vision Research.

[4]  Bing Li,et al.  Predicting Image Memorability by Multi-view Adaptive Regression , 2015, ACM Multimedia.

[5]  Chee Sun Won,et al.  Efficient use of local edge histogram descriptor , 2000, MULTIMEDIA '00.

[6]  Antonio Torralba,et al.  Understanding the Intrinsic Memorability of Images , 2011, NIPS.

[7]  Wei Luo,et al.  Content-Based Photo Quality Assessment , 2013, IEEE Trans. Multim..

[8]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[9]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Mohan S. Kankanhalli,et al.  Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Nicu Sebe,et al.  Complex Event Detection via Multi-source Video Attributes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[13]  Bing Li,et al.  Multi-Perspective Cost-Sensitive Context-Aware Multi-Instance Sparse Coding and Its Application to Sensitive Video Recognition , 2016, IEEE Transactions on Multimedia.

[14]  Shih-Fu Chang,et al.  Designing Category-Level Attributes for Discriminative Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Ivor W. Tsang,et al.  Visual Event Recognition in Videos by Learning from Web Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Timothy F. Brady,et al.  Conceptual Distinctiveness Supports Detailed Visual Long-term Memory for Real-world Objects the Fidelity of Long-term Memory for Visual Information , 2022 .

[17]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[19]  Bernard Ghanem,et al.  What Makes an Object Memorable? , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Jun Wang,et al.  Multiple Emotion Tagging for Multimedia Data by Exploiting High-Order Dependencies Among Emotions , 2015, IEEE Transactions on Multimedia.

[21]  Jianxiong Xiao,et al.  What Makes a Photograph Memorable? , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Antonio Torralba,et al.  Understanding and Predicting Image Memorability at a Large Scale , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Naila Murray,et al.  AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Rama Chellappa,et al.  Cross-View Action Recognition via a Transferable Dictionary Pair , 2012, BMVC.

[25]  Zicheng Liu,et al.  Cross-dataset action detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Ling Shao,et al.  Transfer Learning for Visual Categorization: A Survey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[27]  Chen Xu,et al.  The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding , 2014, International Journal of Computer Vision.

[28]  Tao Xiang,et al.  Robust Subjective Visual Property Prediction from Crowdsourced Pairwise Labels , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[30]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, CVPR.

[31]  Michelle A. Borkin,et al.  What Makes a Visualization Memorable? , 2013, IEEE Transactions on Visualization and Computer Graphics.

[32]  Thomas G. Dietterich,et al.  Improving SVM accuracy by training on auxiliary data sources , 2004, ICML.

[33]  Matei Mancas,et al.  Memorability of natural scenes: The role of attention , 2013, 2013 IEEE International Conference on Image Processing.

[34]  Jianxiong Xiao,et al.  Memorability of Image Regions , 2012, NIPS.

[35]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Ming Shao,et al.  Missing Modality Transfer Learning via Latent Low-Rank Constraint , 2015, IEEE Transactions on Image Processing.

[37]  Ramesh Nallapati,et al.  A Comparative Study of Methods for Transductive Transfer Learning , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[38]  Jianxiong Xiao,et al.  What makes an image memorable , 2011 .

[39]  Ed Diener,et al.  Does television violence enhance program popularity , 1978 .

[40]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[41]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[42]  Yi Yang,et al.  Weakly Supervised Photo Cropping , 2014, IEEE Transactions on Multimedia.

[43]  Dong Liu,et al.  Towards a comprehensive computational model foraesthetic assessment of videos , 2013, MM '13.

[44]  Hanspeter Pfister,et al.  What Makes a Visualization Memorable? , 2013, IEEE Transactions on Visualization and Computer Graphics.

[45]  Nicu Sebe,et al.  Perceptual Attributes Optimization for Multivideo Summarization , 2016, IEEE Transactions on Cybernetics.

[46]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[47]  Aykut Erdem,et al.  Visual Attention-Driven Spatial Pooling for Image Memorability , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[48]  Antonio Torralba,et al.  Modifying the Memorability of Face Photographs , 2013, 2013 IEEE International Conference on Computer Vision.

[49]  Feiping Nie,et al.  Efficient Image Classification via Multiple Rank Regression , 2013, IEEE Transactions on Image Processing.

[50]  R. Plutchik Emotion, a psychoevolutionary synthesis , 1980 .

[51]  Yi Yang,et al.  Beyond Doctors: Future Health Prediction from Multimedia and Multimodal Observations , 2015, ACM Multimedia.

[52]  Changsheng Xu,et al.  Knowing Verb From Object: Retagging With Transfer Learning on Verb-Object Concept Images , 2015, IEEE Transactions on Multimedia.

[53]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[54]  Qiang Yang,et al.  Heterogeneous Transfer Learning for Image Classification , 2011, AAAI.

[55]  Franco Turini,et al.  Time-Annotated Sequences for Medical Data Mining , 2007 .

[56]  Vladimir Pavlovic,et al.  Relative spatial features for image memorability , 2013, ACM Multimedia.

[57]  Zi Huang,et al.  Multi-Feature Fusion via Hierarchical Regression for Multimedia Analysis , 2013, IEEE Transactions on Multimedia.

[58]  Ling Shao,et al.  Weakly-Supervised Cross-Domain Dictionary Learning for Visual Recognition , 2014, International Journal of Computer Vision.

[59]  Yue Gao,et al.  Beyond Text QA: Multimedia Answer Generation by Harvesting Web Information , 2013, IEEE Transactions on Multimedia.

[60]  Luming Zhang,et al.  SnapVideo: Personalized Video Generation for a Sightseeing Trip , 2017, IEEE Transactions on Cybernetics.

[61]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[62]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[63]  Aude Oliva,et al.  Visual long-term memory has a massive storage capacity for object details , 2008, Proceedings of the National Academy of Sciences.