Using Aesthetics and Action Recognition-Based Networks for the Prediction of Media Memorability

In this working note paper we present the contribution and results of the participation of the UPB-L2S team to the MediaEval 2019 Predicting Media Memorability Task. The task requires participants to develop machine learning systems able to predict automatically whether a video will be memorable for the viewer, and for how long (e.g., hours, or days). To solve the task, we investigated several aesthetics and action recognition-based deep neural networks, either by fine-tuning models or by using them as pre-trained feature extractors. Results from different systems were aggregated in various fusion schemes. Experimental results are positive showing the potential of transfer learning for this tasks.

[1]  Bogdan Ionescu,et al.  Computational Understanding of Visual Interestingness Beyond Semantics , 2019, ACM Comput. Surv..

[2]  R. Shepard Recognition memory for words, sentences, and pictures , 1967 .

[3]  Frederic Dufaux,et al.  Predicting Subjectivity in Image Aesthetics Assessment , 2019, 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP).

[4]  Naila Murray,et al.  AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Sumit Shekhar,et al.  Show and Recall: Learning What Makes Videos Memorable , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[6]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Aude Oliva,et al.  Visual long-term memory has a massive storage capacity for object details , 2008, Proceedings of the National Academy of Sciences.

[8]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[9]  Jianxiong Xiao,et al.  What Makes a Photograph Memorable? , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[13]  Mats Sjöberg,et al.  The Predicting Media Memorability Task at MediaEval 2019 , 2019, MediaEval.

[14]  Martin Engilberge,et al.  VideoMem: Constructing, Analyzing, Predicting Short-Term and Long-Term Video Memorability , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Bogdan Ionescu,et al.  LAPI at MediaEval 2017 - Predicting Media Interestingness , 2017, MediaEval.