2020 Sequestered Data Evaluation for Known Activities in Extended Video: Summary and Results

This paper presents a summary and results for the ActEV’20 SDL (Activities in Extended Video Sequestered Data Leaderboard) challenge that was held under the CVPR’20 ActivityNet workshop [38]. The primary goal of the challenge was to provide an impetus for advancing research and capabilities in the field of human activity detection in untrimmed multi-camera videos. Advancements in activity detection will help with a wide range of public safety applications. The challenge was administered by the National Institute of Standards and Technology (NIST), where anyone could submit their system which run on sequestered data with the resulting score posted to a public leaderboard. Ten teams submitted their systems for the ActEV’20 SDL competition on the Multiview Extended Video with Activities (MEVA) test set with 37 target activities. The performance metric for the leaderboard ranking is the partial, normalized Area Under the Detection Error Tradeoff (DET) curve (nAUDC). The top rank on activity detection was by UCF at 37%, followed by CMU at 39% and OPPO at 41%.

[1]  Bolei Zhou,et al.  Moments in Time Dataset: One Million Videos for Event Understanding , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[3]  Haroon Idrees,et al.  The THUMOS challenge on action recognition for videos "in the wild" , 2016, Comput. Vis. Image Underst..

[4]  Saturnino Maldonado-Bascón,et al.  The Instantaneous Accuracy: a Novel Metric for the Problem of Online Human Behaviour Recognition in Untrimmed Videos , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[5]  Jonathan G. Fiscus,et al.  TRECVID 2018: Benchmarking Video Activity Detection, Video Captioning and Matching, Video Storytelling Linking and Video Search , 2018, TRECVID.

[6]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[7]  Cees Snoek,et al.  Online Action Detection , 2016, ECCV.

[8]  Kellie Corona,et al.  MEVA: A Large-Scale Multiview, Multimodal Video Dataset for Activity Detection , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[9]  Jenq-Neng Hwang,et al.  CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yansong Tang,et al.  COIN: A Large-Scale Dataset for Comprehensive Instructional Video Analysis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[12]  Jonathan G. Fiscus,et al.  TRECVID 2019: An evaluation campaign to benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & retrieval , 2019, TRECVID.

[13]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[14]  C. Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  B. Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[16]  Ramakant Nevatia,et al.  CTAP: Complementary Temporal Action Proposal Generation , 2018, ECCV.

[17]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[18]  Cordelia Schmid,et al.  AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Saturnino Maldonado-Bascón,et al.  Rethinking Online Action Detection in Untrimmed Videos: A Novel Online Evaluation Protocol , 2020, IEEE Access.

[21]  Bernard Ghanem,et al.  ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Fabio Viola,et al.  The Kinetics Human Action Video Dataset , 2017, ArXiv.

[23]  Nicu Sebe,et al.  Human in Events: A Large-Scale Benchmark for Human-centric Video Analysis in Complex Events , 2020, ArXiv.

[24]  Qi Tian,et al.  Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Afzal Godil,et al.  Summary of the 2019 Activity Detection in Extended Videos Prize Challenge , 2020, 2020 IEEE Winter Applications of Computer Vision Workshops (WACVW).

[27]  Afzal Godil,et al.  ActEV18: Human Activity Detection Evaluation for Extended Videos , 2019, 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW).

[28]  Mubarak Shah,et al.  Real-World Anomaly Detection in Surveillance Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Ivan Laptev,et al.  HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).