论文信息 - Reproducibility Companion Paper: Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework

Reproducibility Companion Paper: Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework

In this companion paper, we provide details of the artifacts to support the replication of "Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework", which was presented at MM'20. The Inter-intra Contrastive (IIC) framework aims to extract more discriminative temporal information by extending intra-negative samples in contrastive self-supervised learning. In this paper, we first summarize our contribution. Then we explain the file structure of the source code and detailed settings. Since our proposal is a framework which contain a lot of different settings, we provide some custom settings to help other researchers to use our methods easily. The source code is available at https://github.com/BestJuly/IIC.

[1] Yueting Zhuang,et al. Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Stella X. Yu,et al. Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[4] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[5] Phillip Isola,et al. Contrastive Multiview Coding , 2019, ECCV.

[6] Yann LeCun,et al. A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7] Yutaka Satoh,et al. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8] Thomas Serre,et al. HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[9] Weiping Wang,et al. Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning , 2020, AAAI.

[10] Toshihiko Yamasaki,et al. Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework , 2020, ACM Multimedia.