Overview of The MediaEval 2021 Predicting Media Memorability Task

This paper describes the MediaEval 2021 Predicting Media Memorability task, which is in its 4th edition this year, as the prediction of short-term and long-term video memorability remains a challenging task. In 2021, two datasets of videos are used: first, a subset of the TRECVid 2019 Video-to-Text dataset; second, the Memento10K dataset in order to provide opportunities to explore cross-dataset generalisation. In addition, an Electroencephalography (EEG)-based prediction pilot subtask is introduced. In this paper,we outline themain aspects of the task and describe the datasets, evaluationmetrics, and requirements for participants’ submissions.

[1]  Dong-Chen He,et al.  Texture Unit, Texture Spectrum And Texture Analysis , 1989, 12th Canadian Symposium on Remote Sensing Geoscience and Remote Sensing Symposium,.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Sebastian Halder,et al.  Overview of the EEG Pilot Subtask at MediaEval 2021: Predicting Media Memorability , 2021, MediaEval.

[4]  Alan F. Smeaton,et al.  An Annotated Video Dataset for Computing Video Memorability , 2021, Data in Brief.

[5]  Claire-Hélène Demarty,et al.  Deep Learning for Predicting Image Memorability , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Aren Jansen,et al.  CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Jonathan G. Fiscus,et al.  TRECVID 2019: An evaluation campaign to benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & retrieval , 2019, TRECVID.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Mats Sjöberg,et al.  The Predicting Media Memorability Task at MediaEval 2019 , 2019, MediaEval.

[13]  Sumit Shekhar,et al.  Show and Recall: Learning What Makes Videos Memorable , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[14]  Alan F. Smeaton,et al.  Overview of MediaEval 2020 Predicting Media Memorability Task: What Makes a Video Memorable? , 2020, ArXiv.

[15]  Nicu Sebe,et al.  Increasing Image Memorability with Neural Style Transfer , 2019, ACM Trans. Multim. Comput. Commun. Appl..

[16]  Martin Engilberge,et al.  VideoMem: Constructing, Analyzing, Predicting Short-Term and Long-Term Video Memorability , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Antonio Torralba,et al.  Understanding and Predicting Image Memorability at a Large Scale , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Aude Oliva,et al.  Multimodal Memorability: Modeling Effects of Semantics and Decay on Video Memorability , 2020, ECCV.