Reproducibility Companion Paper: Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework

In this companion paper, we provide details of the artifacts to support the replication of "Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework", which was presented at MM'20. The Inter-intra Contrastive (IIC) framework aims to extract more discriminative temporal information by extending intra-negative samples in contrastive self-supervised learning. In this paper, we first summarize our contribution. Then we explain the file structure of the source code and detailed settings. Since our proposal is a framework which contain a lot of different settings, we provide some custom settings to help other researchers to use our methods easily. The source code is available at https://github.com/BestJuly/IIC.

[1]  Yueting Zhuang,et al.  Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[4]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[6]  Yann LeCun,et al.  A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Yutaka Satoh,et al.  Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[9]  Weiping Wang,et al.  Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning , 2020, AAAI.

[10]  Toshihiko Yamasaki,et al.  Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework , 2020, ACM Multimedia.