论文信息 - An Annotated Video Dataset for Computing Video Memorability

An Annotated Video Dataset for Computing Video Memorability

Using a collection of publicly available links to short form video clips of an average of 6 seconds duration each, 1,275 users manually annotated each video multiple times to indicate both longterm and short-term memorability of the videos. The annotations were gathered as part of an online memory game and measured a participant’s ability to recall having seen the video previously when shown a collection of videos. The recognition tasks were performed on videos seen within the previous few minutes for short-term memorability and within the previous 24 to 72 hours for long-term memorability. Data includes the reaction times for each recognition of each video. Associated with each video are text descriptions (captions) as well as a collection of image-level features applied to 3 frames extracted from each video (start, middle and end). Video-level features are also provided. The dataset was used in the Video Memorability task as part of the MediaEval benchmark in 2020. Specifications Table ∗Corresponding author. e-mail: Alan.Smeaton@DCU.ie (Alan F. Smeaton) Preprint submitted to Data in BriefDecember 7, 2021 ar X iv :2 11 2. 02 30 3v 1 [ cs .C V ] 4 D ec 2 02 1 2 R.S. Kiziltepe et. al / Data in Brief (2021) Subject Computer Vision and Pattern Recognition Specific subject area Ground truth data (videos, video features plus annotations) needed to build and train systems for the automatic computation of the memorability of short video clips Type of data Text files (csv) How data were acquired Raw videos are already publicly available online. Low level features were extracted automatically from videos and annotation data was collected through crowdsourcing using a video memorability game with the participation of both volunteers and paid workers on Amazon Mechanical Turk. Data format Raw Analyzed Parameters for data collection The maximum false alarm rate (short-term): 30% The maximum false alarm rate (long-term): 40% The minimum recognition rate of vigilance fillers (short-term): 70% The minimum recognition rate (long-term): 15% The false alarm rate must be lower than the recognition rate (long-term). Description of data collection 1,500 short videos selected from the Vimeo Creative Commons (V3C1) dataset and used in the TRECVid 2019 Video-to-Text task were divided into three non-overlapping subsets: training, development, and testing. Multiple manual memorability annotations for each video were collected via a video memorability game, which displays a series of short videos and requires users to press the spacebar when they recall a video previously seen by them. The game consists of two parts: in the first part where videos are repeated within a few minutes, the user interaction with a repeated video was collected to calculate short-term memorability scores. The second part took place between 24 and 72 hours after initial viewing of videos, and this time the participants’ responses to previously seen videos from the first part were collected to acquire long-term memorability scores. After analysing the collected annotations, the short-term and the long-term memorability scores of each video were calculated as a percentage of correctly recalled videos, respectively. Each video memorabiity annotation is accompanied by the video timepoint offsets at which it was recalled by users, response times of the users, the key pressed when watching each video, and textual captions describing each video from the TRECVid benchmark. The Media Memorability 2020 dataset is included here with memorability annotations on 590 videos as part of the training set and 410 additional videos as part of the development set. In this dataset we provide memorability annotations for the development and training set videos but not the test set as this is used in future MediaEval memorability benchmark tasks. Data source location Primary data sources: TRECVid 2019 Video-to-Text dataset [1], available from: https: //www-nlpir.nist.gov/projects/trecvid/trecvid.data.html Institution: National Institute of Standards and Technology (NIST) City: Gaithersburg Country: USA Data accessibility Repository name: Figshare Direct URL to data: https://doi.org/10.6084/m9.figshare.15105867.v2 Source code used to process this is adapted from [2] and is available at https://github. com/InterDigitalInc/VideoMemAnnotationProtocol/ Related research article A. Garcı́a Seco de Herrera, R. Savran Kiziltepe, J. Chamberlain, M. G. Constantin, C.-H. Demarty, F. Doctor, B. Ionescu, A. F. Smeaton Overview of MediaEval 2020 Predicting Media Memorability Task: What Makes a Video Memorable? MediaEval Workshop, online 14-15 December, 2020 [3] DOI:

[1] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[2] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[3] Jonathan G. Fiscus,et al. TRECVID 2019: An evaluation campaign to benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & retrieval , 2019, TRECVID.

[4] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5] Matti Pietikäinen,et al. Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6] Martin Engilberge,et al. VideoMem: Constructing, Analyzing, Predicting Short-Term and Long-Term Video Memorability , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7] Joshua R de Leeuw,et al. jsPsych: A JavaScript library for creating behavioral experiments in a Web browser , 2014, Behavior Research Methods.

[8] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9] Alan F. Smeaton,et al. Overview of MediaEval 2020 Predicting Media Memorability Task: What Makes a Video Memorable? , 2020, ArXiv.