The 2nd YouTube-8M Large-Scale Video Understanding Challenge

We hosted the 2nd YouTube-8M Large-Scale Video Understanding Kaggle Challenge and Workshop at ECCV’18, with the task of classifying videos from frame-level and video-level audio-visual features. In this year’s challenge, we restricted the final model size to 1 GB or less, encouraging participants to explore representation learning or better architecture, instead of heavy ensembles of multiple models. In this paper, we briefly introduce the YouTube-8M dataset and challenge task, followed by participants statistics and result analysis. We summarize proposed ideas by participants, including architectures, temporal aggregation methods, ensembling and distillation, data augmentation, and more.

[1]  Kyoung-Woon On,et al.  Temporal Attention Mechanism with Conditional Inference for Large-Scale Multi-label Video Classification , 2018, ECCV Workshops.

[2]  Jianping Fan,et al.  NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification , 2018, ECCV Workshops.

[3]  David Austin,et al.  Building A Size Constrained Predictive Models for Video Classification , 2018, ECCV Workshops.

[4]  Xi Wang,et al.  Aggregating Frame-level Features for Large-Scale Video Classification , 2017, ArXiv.

[5]  Shivam Garg,et al.  Learning Video Features for Multi-label Classification , 2018, ECCV Workshops.

[6]  Bo Liu,et al.  Constrained-size Tensorflow Models for YouTube-8M Video Understanding Challenge , 2018, ECCV Workshops.

[7]  Apostol Natsev,et al.  YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.

[8]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[9]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Gunhee Kim,et al.  Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset , 2017, ArXiv.

[11]  Haosheng Zou,et al.  The YouTube-8M Kaggle Competition: Challenges and Methods , 2017, ArXiv.

[12]  Apostol Natsev,et al.  Collaborative Deep Metric Learning for Video Understanding , 2018, KDD.

[13]  Miha Skalic,et al.  Deep Learning Methods for Efficient Large Scale Video Labeling , 2017, ArXiv.

[14]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Xiao Liu,et al.  Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding , 2017, ArXiv.

[16]  Ivan Laptev,et al.  Learnable pooling with Context Gating for video classification , 2017, ArXiv.

[17]  Sebastian Kmiec,et al.  Learnable Pooling Methods for Video Classification , 2018, ECCV Workshops.

[18]  Yi Yang,et al.  UTS submission to Google YouTube-8M Challenge 2017 , 2017, ArXiv.

[19]  Miroslaw Bober,et al.  Cultivating DNN Diversity for Large Scale Video Labelling , 2017, ArXiv.

[20]  Josef Sivic,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Sergey I. Nikolenko,et al.  Label Denoising with Large Ensembles of Heterogeneous Neural Networks , 2018, ECCV Workshops.

[22]  Yann Chevaleyre,et al.  Training compact deep learning models for video classification using circulant matrices , 2018, ECCV Workshops.

[23]  Xing Zhang,et al.  Non-local NetVLAD Encoding for Video Classification , 2018, ECCV Workshops.

[24]  Minsoo Jeong,et al.  Approach for Video Classification with Multi-label on YouTube-8M Dataset , 2018, ECCV Workshops.

[25]  Ji Wu,et al.  The Monkeytyping Solution to the YouTube-8M Video Understanding Challenge , 2017, ArXiv.