暂无分享,去创建一个
[1] Lorenzo Torresani,et al. Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization , 2018, NeurIPS.
[2] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[3] Gunnar Farnebäck,et al. Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.
[4] Yutaka Satoh,et al. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[5] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Chen Gao,et al. Why Can't I Dance in the Mall? Learning to Mitigate Scene Bias in Action Recognition , 2019, NeurIPS.
[7] Yiqi Lin,et al. Learning Spatio-temporal Representation by Channel Aliasing Video Perception , 2021, ACM Multimedia.
[8] Cordelia Schmid,et al. Learning Video Representations using Contrastive Bidirectional Transformer , 2019 .
[9] Angelika Bayer,et al. A First Course In Probability , 2016 .
[10] Andrew Zisserman,et al. Self-supervised Co-training for Video Representation Learning , 2020, NeurIPS.
[11] William T. Freeman,et al. SpeedNet: Learning the Speediness in Videos , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Honglak Lee,et al. Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.
[13] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[14] Yueting Zhuang,et al. Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Jianbo Jiao,et al. Self-supervised Video Representation Learning by Pace Prediction , 2020, ECCV.
[17] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[18] Ke Li,et al. Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion , 2020, AAAI.
[19] Weiping Wang,et al. Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning , 2020, AAAI.
[20] Serge J. Belongie,et al. Spatiotemporal Contrastive Video Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Andrew Zisserman,et al. Video Representation Learning by Dense Predictive Coding , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[22] Ullrich Köthe,et al. Guided Image Generation with Conditional Invertible Neural Networks , 2019, ArXiv.
[23] Yi Li,et al. RESOUND: Towards Action Recognition Without Representation Bias , 2018, ECCV.
[24] Yiqi Lin,et al. Multi-Level Temporal Dilated Dense Prediction for Action Recognition , 2022, IEEE Transactions on Multimedia.
[25] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.
[26] Yuting Gao,et al. Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[28] Gunhee Kim,et al. Self-Supervised Learning of Compressed Video Representations , 2021, ICLR.
[29] Ming-Hsuan Yang,et al. Unsupervised Representation Learning by Sorting Sequences , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[30] Premkumar Natarajan,et al. Bidirectional Conditional Generative Adversarial Networks , 2017, ACCV.
[31] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Andrew Zisserman,et al. Memory-augmented Dense Predictive Coding for Video Representation Learning , 2020, ECCV.
[33] Thomas Serre,et al. HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.
[34] Yu Zhou,et al. Video Playback Rate Perception for Self-Supervised Spatio-Temporal Representation Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[36] Prafulla Dhariwal,et al. Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.
[37] Hadi M. Dolatabadi,et al. AdvFlow: Inconspicuous Black-box Adversarial Attacks using Normalizing Flows , 2020, NeurIPS.
[38] Runhao Zeng,et al. RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning , 2020, AAAI.
[39] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[40] Ullrich Köthe,et al. Analyzing Inverse Problems with Invertible Neural Networks , 2018, ICLR.
[41] Martial Hebert,et al. Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification , 2016, ECCV.
[42] Stella X. Yu,et al. Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[43] Chen Sun,et al. Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification , 2017, ECCV.
[44] Khiem Doan,et al. ultralytics/yolov5: v4.0 - nn.SiLU() activations, Weights & Biases logging, PyTorch Hub integration , 2021 .
[45] Yoshua Bengio,et al. NICE: Non-linear Independent Components Estimation , 2014, ICLR.
[46] Tie-Yan Liu,et al. Invertible Image Rescaling , 2020, ECCV.
[47] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.
[48] Ivan Kobyzev,et al. Normalizing Flows: An Introduction and Review of Current Methods , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[49] Kaiming He,et al. Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.
[50] Bernard Ghanem,et al. Self-Supervised Learning by Cross-Modal Audio-Video Clustering , 2019, NeurIPS.
[51] Samy Bengio,et al. Density estimation using Real NVP , 2016, ICLR.
[52] Yali Wang,et al. MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video , 2021, ArXiv.
[53] Longlong Jing,et al. Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction. , 2018, 1811.11387.