Ego-Only: Egocentric Action Detection without Exocentric Transferring
暂无分享,去创建一个
[1] Ishan Misra,et al. Learning Video Representations from Large Language Models , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Mike Zheng Shou,et al. Egocentric Video-Language Pretraining , 2022, NeurIPS.
[3] Haoqi Fan,et al. Masked Autoencoders As Spatiotemporal Learners , 2022, NeurIPS.
[4] Gedas Bertasius,et al. TALLFormer: Temporal Action Localization with Long-memory Transformer , 2022, ECCV.
[5] Limin Wang,et al. VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training , 2022, NeurIPS.
[6] Bernard Ghanem,et al. SegTAD: Precise Temporal Action Detection via Semantic Segmentation , 2022, ECCV Workshops.
[7] Yin Li,et al. ActionFormer: Localizing Moments of Actions with Transformers , 2022, ECCV.
[8] J. Malik,et al. MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] L. V. D. Maaten,et al. Omnivore: A Single Model for Many Visual Modalities , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] C. Schmid,et al. Multiview Transformers for Video Recognition , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] A. Yuille,et al. Masked Feature Prediction for Self-Supervised Visual Pre-Training , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] J. Malik,et al. MViTv2: Improved Multiscale Vision Transformers for Classification and Detection , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Ross B. Girshick,et al. Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Jian Ma,et al. Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100 , 2021, Int. J. Comput. Vis..
[15] James M. Rehg,et al. Ego4D: Around the World in 3,000 Hours of Egocentric Video , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Stephen Lin,et al. Video Swin Transformer , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] L. Torresani,et al. Long-Short Temporal Contrastive Learning of Video Transformers , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Tao Kong,et al. iBOT: Image BERT Pre-Training with Online Tokenizer , 2021, ArXiv.
[19] Dima Damen,et al. With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition , 2021, BMVC.
[20] Heng Wang,et al. Interactive Prototype Learning for Egocentric Action Recognition , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[21] Julien Mairal,et al. Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[22] Christoph Feichtenhofer,et al. Multiscale Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[23] K. Grauman,et al. Ego-Exo: Transferring Visual Representations from Third-person to First-person Videos , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Andrew Zisserman,et al. Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[25] Cordelia Schmid,et al. ViViT: A Video Vision Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[26] Matthew A. Brown,et al. MoViNets: Mobile Video Networks for Efficient Video Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Heng Wang,et al. Is Space-Time Attention All You Need for Video Understanding? , 2021, ICML.
[28] A. Yuille,et al. MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Xinlei Chen,et al. Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[31] Dima Damen,et al. The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[32] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[33] Julien Mairal,et al. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.
[34] Pierre H. Richemond,et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.
[35] Christoph Feichtenhofer,et al. X3D: Expanding Architectures for Efficient Video Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Gaurav Sharma,et al. Unsupervised and Semi-Supervised Domain Adaptation for Action Recognition from Drones , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).
[37] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[38] Ali K. Thabet,et al. G-TAD: Sub-Graph Localization for Temporal Action Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Ross B. Girshick,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Quoc V. Le,et al. Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[41] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[42] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[43] Shilei Wen,et al. BMN: Boundary-Matching Network for Temporal Action Proposal Generation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[44] Seong Joon Oh,et al. CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[45] Yansong Tang,et al. COIN: A Large-Scale Dataset for Comprehensive Instructional Video Analysis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Jitendra Malik,et al. SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[47] Qi Tian,et al. Iterative Reorganization With Weak Spatial Constraints: Solving Arbitrary Jigsaw Puzzles for Unsupervised Representation Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Kaiming He,et al. Rethinking ImageNet Pre-Training , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[49] Chuang Gan,et al. TSM: Temporal Shift Module for Efficient Video Understanding , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[50] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[51] Limin Wang,et al. Temporal Segment Networks for Action Recognition in Videos , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[52] Bernard Ghanem,et al. Diagnosing Error in Temporal Action Detectors , 2018, ECCV.
[53] Matthijs Douze,et al. Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.
[54] Stella X. Yu,et al. Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[55] Cordelia Schmid,et al. Actor and Observer: Joint Modeling of First and Third-Person Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[56] Cordelia Schmid,et al. Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos , 2018, ArXiv.
[57] Nikos Komodakis,et al. Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.
[58] George Papandreou,et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.
[59] Yann LeCun,et al. A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[60] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[61] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[62] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[63] Xu Zhao,et al. Single Shot Temporal Action Detection , 2017, ACM Multimedia.
[64] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[65] George Papandreou,et al. Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.
[66] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[67] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[68] Haroon Idrees,et al. The THUMOS challenge on action recognition for videos "in the wild" , 2016, Comput. Vis. Image Underst..
[69] Alan L. Yuille,et al. UnrealCV: Connecting Computer Vision to Unreal Engine , 2016, ECCV Workshops.
[70] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[71] Paolo Favaro,et al. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.
[72] Bernard Ghanem,et al. ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[73] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[74] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.
[75] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[76] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[77] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[78] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[79] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[80] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .
[81] G. LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .