Predicting Media Memorability Using Deep Features and Recurrent Network

In the Predicting Media Memorability Task at the MediaEval Challenge 2018, our team proposes an approach that uses deep visual features and recurrent network to predict videos’ memorability. Features are extracted from CNN for a number of frames in each video. We forward these through a LSTM network to model the structure of the video and predict its memorability score. Our method achieves a correlation score of 0.484 on short-term task and 0.257 on long-term task in the official test set.

[1]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Mats Sjöberg,et al.  C V ] 3 J ul 2 01 8 MediaEval 2018 : Predicting Media Memorability , 2018 .

[3]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Claire-Hélène Demarty,et al.  Annotating, Understanding, and Predicting Long-term Video Memorability , 2018, ICMR.

[5]  Ling Shao,et al.  Learning Computational Models of Video Memorability from fMRI Brain Imaging , 2015, IEEE Transactions on Cybernetics.

[6]  Antonio Torralba,et al.  Understanding and Predicting Image Memorability at a Large Scale , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[8]  Sumit Shekhar,et al.  Show and Recall: Learning What Makes Videos Memorable , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[9]  Claire-Hélène Demarty,et al.  Deep Learning for Predicting Image Memorability , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Paolo Remagnino,et al.  AMNet: Memorability Estimation with Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.