Video Memorability Prediction with Recurrent Neural Networks and Video Titles at the 2018 MediaEval Predicting Media Memorability Task

This paper describes the approach developed to predict the shortterm and long-term video memorability at the 2018 MediaEval Predicting Media Memorability Task [1]. This approach utilizes the scene semantics derived from the titles of the videos using natural language processing (NLP) techniques and a recurrent neural network (RNN). Compared to using video-based features, this approach has a low computational cost for feature extraction. The performance of the semantic-based methods are compared with those of the aesthetic feature-based methods using support vector regression (ε-SVR) and artificial neural network (ANN) models, and the possibility of predicting the highly subjective media memorability with simple features is explored.

[1]  Claire-Hélène Demarty,et al.  Annotating, Understanding, and Predicting Long-term Video Memorability , 2018, ICMR.

[2]  Jianxiong Xiao,et al.  What Makes a Photograph Memorable? , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Sumit Shekhar,et al.  Show and Recall: Learning What Makes Videos Memorable , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[4]  Luc Van Gool,et al.  The Interestingness of Images , 2013, 2013 IEEE International Conference on Computer Vision.

[5]  Mats Sjöberg,et al.  C V ] 3 J ul 2 01 8 MediaEval 2018 : Predicting Media Memorability , 2018 .