论文信息 - The 2nd YouTube-8M Large-Scale Video Understanding Challenge - 字舞流文

The 2nd YouTube-8M Large-Scale Video Understanding Challenge

We hosted the 2nd YouTube-8M Large-Scale Video Understanding Kaggle Challenge and Workshop at ECCV’18, with the task of classifying videos from frame-level and video-level audio-visual features. In this year’s challenge, we restricted the final model size to 1 GB or less, encouraging participants to explore representation learning or better architecture, instead of heavy ensembles of multiple models. In this paper, we briefly introduce the YouTube-8M dataset and challenge task, followed by participants statistics and result analysis. We summarize proposed ideas by participants, including architectures, temporal aggregation methods, ensembling and distillation, data augmentation, and more.

Rahul Sukthankar | Apostol Natsev | George Toderici | Joonseok Lee | Walter Reade | R. Sukthankar | G. Toderici | A. Natsev | Joonseok Lee | Walter Reade

[1] Kyoung-Woon On,et al. Temporal Attention Mechanism with Conditional Inference for Large-Scale Multi-label Video Classification , 2018, ECCV Workshops.

[2] Jianping Fan,et al. NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification , 2018, ECCV Workshops.

[3] David Austin,et al. Building A Size Constrained Predictive Models for Video Classification , 2018, ECCV Workshops.

[4] Xi Wang,et al. Aggregating Frame-level Features for Large-Scale Video Classification , 2017, ArXiv.

[5] Shivam Garg,et al. Learning Video Features for Multi-label Classification , 2018, ECCV Workshops.

[6] Bo Liu,et al. Constrained-size Tensorflow Models for YouTube-8M Video Understanding Challenge , 2018, ECCV Workshops.

[7] Apostol Natsev,et al. YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.

[8] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[9] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Gunhee Kim,et al. Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset , 2017, ArXiv.

[11] Haosheng Zou,et al. The YouTube-8M Kaggle Competition: Challenges and Methods , 2017, ArXiv.

[12] Apostol Natsev,et al. Collaborative Deep Metric Learning for Video Understanding , 2018, KDD.

[13] Miha Skalic,et al. Deep Learning Methods for Efficient Large Scale Video Labeling , 2017, ArXiv.

[14] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15] Xiao Liu,et al. Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding , 2017, ArXiv.

[16] Ivan Laptev,et al. Learnable pooling with Context Gating for video classification , 2017, ArXiv.

[17] Sebastian Kmiec,et al. Learnable Pooling Methods for Video Classification , 2018, ECCV Workshops.

[18] Yi Yang,et al. UTS submission to Google YouTube-8M Challenge 2017 , 2017, ArXiv.

[19] Miroslaw Bober,et al. Cultivating DNN Diversity for Large Scale Video Labelling , 2017, ArXiv.

[20] Josef Sivic,et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Sergey I. Nikolenko,et al. Label Denoising with Large Ensembles of Heterogeneous Neural Networks , 2018, ECCV Workshops.

[22] Yann Chevaleyre,et al. Training compact deep learning models for video classification using circulant matrices , 2018, ECCV Workshops.

[23] Xing Zhang,et al. Non-local NetVLAD Encoding for Video Classification , 2018, ECCV Workshops.

[24] Minsoo Jeong,et al. Approach for Video Classification with Multi-label on YouTube-8M Dataset , 2018, ECCV Workshops.

[25] Ji Wu,et al. The Monkeytyping Solution to the YouTube-8M Video Understanding Challenge , 2017, ArXiv.