Transfer Learning for Video Memorability Prediction

This paper summarizes Technicolor’s computational models to predict memorability of videos within the MediaEval 2018 Predicting Media Memorability Task. Our systems are based on deep learning features and architectures, and exploit the use of both semantic and multimodal features. Based on the obtained results, we discuss our findings and some scientific perspectives for the task.

[1]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[2]  Mats Sjöberg,et al.  C V ] 3 J ul 2 01 8 MediaEval 2018 : Predicting Media Memorability , 2018 .

[3]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Peter Salamon,et al.  Can we measure beauty? Computational evaluation of coral reef aesthetics , 2015, PeerJ.

[6]  Claire-Hélène Demarty,et al.  Annotating, Understanding, and Predicting Long-term Video Memorability , 2018, ICMR.

[7]  Patrick Le Callet,et al.  Deep Learning for Image Memorability Prediction: the Emotional Bias , 2016, ACM Multimedia.

[8]  Jianxiong Xiao,et al.  What Makes a Photograph Memorable? , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Jurandy Almeida,et al.  Comparison of video sequences with histograms of motion patterns , 2011, 2011 18th IEEE International Conference on Image Processing.

[10]  Claire-Hélène Demarty,et al.  Deep Learning for Predicting Image Memorability , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Antonio Torralba,et al.  Understanding and Predicting Image Memorability at a Large Scale , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Martin Engilberge,et al.  Finding Beans in Burgers: Deep Semantic-Visual Embedding with Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Dong-Chen He,et al.  Texture Unit, Texture Spectrum, And Texture Analysis , 1990 .