暂无分享,去创建一个
[1] Ivan Laptev,et al. Unsupervised Learning from Narrated Instruction Videos , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Andrew Zisserman,et al. A Short Note on the Kinetics-700 Human Action Dataset , 2019, ArXiv.
[3] Andrew Zisserman,et al. Learning and Using the Arrow of Time , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[4] Bernard Ghanem,et al. Self-Supervised Learning by Cross-Modal Audio-Video Clustering , 2019, NeurIPS.
[5] Trevor Darrell,et al. Learning Features by Watching Objects Move , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Antonio Criminisi,et al. Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.
[7] Wei-Ying Ma,et al. Annotating Images by Mining Image Search Results , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[8] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.
[9] Apostol Natsev,et al. YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.
[10] Pietro Perona,et al. Learning Object Categories From Internet Image Searches , 2010, Proceedings of the IEEE.
[11] Andrew Owens,et al. Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.
[12] Cordelia Schmid,et al. Contrastive Bidirectional Transformer for Temporal Representation Learning , 2019, ArXiv.
[13] Zihang Lai,et al. Self-supervised Learning for Video Correspondence Flow , 2019, ArXiv.
[14] Ivan Laptev,et al. End-to-End Learning of Visual Representations From Uncurated Instructional Videos , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Amit K. Roy-Chowdhury,et al. Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval , 2018, ACM Multimedia.
[16] Ivan Laptev,et al. HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[17] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[19] Thomas Serre,et al. HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.
[20] Lorenzo Torresani,et al. Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization , 2018, NeurIPS.
[21] Allan Jabri,et al. Learning Visual N-Grams from Web Data , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[22] Martial Hebert,et al. Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification , 2016, ECCV.
[23] Hilde Kuehne,et al. Mining YouTube - A dataset for learning fine-grained action concepts from webly supervised video data , 2019, ArXiv.
[24] Antonio Torralba,et al. Anticipating Visual Representations from Unlabeled Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Cordelia Schmid,et al. AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[26] Ming-Hsuan Yang,et al. Unsupervised Representation Learning by Sorting Sequences , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[27] Heng Wang,et al. Large-Scale Weakly-Supervised Pre-Training for Video Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Efstratios Gavves,et al. Self-Supervised Video Representation Learning with Odd-One-Out Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Xinlei Chen,et al. Webly Supervised Learning of Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[30] Chen Sun,et al. Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification , 2017, ECCV.
[31] Sergio Guadarrama,et al. Tracking Emerges by Colorizing Videos , 2018, ECCV.
[32] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[33] Yueting Zhuang,et al. Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Phillip Isola,et al. Contrastive Multiview Coding , 2019, ECCV.
[35] Chenliang Xu,et al. Towards Automatic Learning of Procedures From Web Instructional Videos , 2017, AAAI.
[36] Yingli Tian,et al. Self-supervised Spatiotemporal Feature Learning by Video Geometric Transformations , 2018, ArXiv.
[37] Chuang Gan,et al. Self-supervised Audio-visual Co-segmentation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[39] Allan Jabri,et al. Learning Visual Features from Large Weakly Supervised Data , 2015, ECCV.
[40] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[41] Cordelia Schmid,et al. Learning Video Representations using Contrastive Bidirectional Transformer , 2019 .
[42] Xinlei Chen,et al. NEIL: Extracting Visual Knowledge from Web Data , 2013, 2013 IEEE International Conference on Computer Vision.
[43] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[44] Gabriel Kreiman,et al. Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.
[45] In-So Kweon,et al. Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles , 2018, AAAI.
[46] Longlong Jing,et al. Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction. , 2018, 1811.11387.
[47] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.
[48] Ali Farhadi,et al. Learning Everything about Anything: Webly-Supervised Visual Concept Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[49] Ali Farhadi,et al. Deep Classifiers from Image Tags in the Wild , 2015, MMCommons '15.
[50] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.
[51] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[52] Wei Liu,et al. Self-Supervised Spatio-Temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[54] Limin Wang,et al. Learning Spatiotemporal Features via Video and Text Pair Discrimination , 2020, ArXiv.
[55] Chen Sun,et al. Webly-Supervised Video Recognition by Mutually Voting for Relevant Web Images and Web Video Frames , 2016, ECCV.
[56] Alexander G. Hauptmann,et al. Instructional Videos for Unsupervised Harvesting and Learning of Action Examples , 2014, ACM Multimedia.
[57] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[58] Svetlana Lazebnik,et al. Improving Image-Sentence Embeddings Using Large Weakly Annotated Photo Collections , 2014, ECCV.
[59] Lorenzo Torresani,et al. Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach , 2010, NIPS.
[60] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[61] Michael Isard,et al. A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.
[62] Ivan Laptev,et al. Cross-Task Weakly Supervised Learning From Instructional Videos , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[63] Yann LeCun,et al. Deep multi-scale video prediction beyond mean square error , 2015, ICLR.
[64] Andrew Zisserman,et al. A Short Note about Kinetics-600 , 2018, ArXiv.
[65] Andrew Zisserman,et al. Video Representation Learning by Dense Predictive Coding , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[66] Antonio Torralba,et al. Generating Videos with Scene Dynamics , 2016, NIPS.
[67] Cordelia Schmid,et al. Speech2Action: Cross-Modal Supervision for Action Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[68] Leonidas J. Guibas,et al. Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.