Few-shot Action Recognition with Captioning Foundation Models
暂无分享,去创建一个
[1] Jingren Zhou,et al. VideoComposer: Compositional Video Synthesis with Motion Controllability , 2023, NeurIPS.
[2] Shiwei Zhang,et al. Cross-domain few-shot action recognition with unlabeled videos , 2023, Comput. Vis. Image Underst..
[3] Shiwei Zhang,et al. MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Action Recognition , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Shiwei Zhang,et al. CLIP-guided Prototype Modulating for Few-shot Action Recognition , 2023, International Journal of Computer Vision.
[5] Shiwei Zhang,et al. HyRSM++: Hybrid Relation Guided Temporal Set Matching for Few-shot Action Recognition , 2023, ArXiv.
[6] Mike Zheng Shou,et al. Position-Guided Text Prompt for Vision-Language Pre-Training , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Qun Liu,et al. LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling , 2022, EMNLP.
[8] S. Savarese,et al. Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training , 2022, EMNLP.
[9] Yu-Gang Jiang,et al. OmniVL: One Foundation Model for Image-Language and Video-Language Tasks , 2022, NeurIPS.
[10] Samuel Albanie,et al. RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection , 2022, NeurIPS.
[11] Quoc-Huy Tran,et al. Inductive and Transductive Few-Shot Video Classification via Appearance and Temporal Alignments , 2022, ECCV.
[12] Percy Liang,et al. Is a Caption Worth a Thousand Images? A Controlled Study for Representation Learning , 2022, ArXiv.
[13] Yifei Huang,et al. Compound Prototype Matching for Few-shot Action Recognition , 2022, ECCV.
[14] Tianzhu Zhang,et al. Motion-modulated Temporal Fragment Alignment Network For Few-Shot Action Recognition , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Zhe Gan,et al. GIT: A Generative Image-to-text Transformer for Vision and Language , 2022, Trans. Mach. Learn. Res..
[16] Derek Hoiem,et al. Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners , 2022, NeurIPS.
[17] Zirui Wang,et al. CoCa: Contrastive Captioners are Image-Text Foundation Models , 2022, Trans. Mach. Learn. Res..
[18] Oriol Vinyals,et al. Flamingo: a Visual Language Model for Few-Shot Learning , 2022, NeurIPS.
[19] Shiwei Zhang,et al. Hybrid Relation Guided Set Matching for Few-shot Action Recognition , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Jingren Zhou,et al. OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework , 2022, ICML.
[21] S. Hoi,et al. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation , 2022, ICML.
[22] F. Khan,et al. Spatio-temporal Relation Modeling for Few-shot Action Recognition , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Dongdong Chen,et al. CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Weidi Xie,et al. Prompting Visual-Language Models for Efficient Video Understanding , 2021, ECCV.
[25] Lorenzo Torresani,et al. Label Hallucination for Few-Shot Classification , 2021, AAAI.
[26] Xiaowei Hu,et al. Scaling Up Vision-Language Pretraining for Image Captioning , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Daniel Keysers,et al. LiT: Zero-Shot Transfer with Locked-image text Tuning , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Li Dong,et al. VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts , 2021, NeurIPS.
[29] Chen Change Loy,et al. Learning to Prompt for Vision-Language Models , 2021, International Journal of Computer Vision.
[30] Massimiliano Pontil,et al. The Role of Global Labels in Few-Shot Classification and How to Infer Them , 2021, NeurIPS.
[31] Junnan Li,et al. Align before Fuse: Vision and Language Representation Learning with Momentum Distillation , 2021, NeurIPS.
[32] John See,et al. TA2N: Two-Stage Action Alignment Network for Few-Shot Action Recognition , 2021, AAAI.
[33] Yejin Choi,et al. VinVL: Revisiting Visual Representations in Vision-Language Models , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Yongjian Wu,et al. RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Songyang Zhang,et al. Learning Implicit Temporal Alignment for Few-shot Video Classification , 2021, IJCAI.
[36] Yin Cui,et al. Open-vocabulary Object Detection via Vision and Language Knowledge Distillation , 2021, ICLR.
[37] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[38] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[39] Majid Mirmehdi,et al. Temporal-Relational CrossTransformers for Few-Shot Action Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Lingfeng Wang,et al. Few-Shot Learning via Feature Hallucination with Variational Inference , 2021, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).
[41] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[42] Yi Yang,et al. Label Independent Memory for Semi-Supervised Few-Shot Video Classification , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[43] Jianfeng Gao,et al. DeBERTa: Decoding-enhanced BERT with Disentangled Attention , 2020, ICLR.
[44] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[45] Kai Li,et al. Adversarial Feature Hallucination Networks for Few-Shot Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Hongdong Li,et al. Few-Shot Action Recognition with Permutation-Invariant Attention , 2020, ECCV.
[47] F. Hutter,et al. Meta-Learning of Neural Architectures for Few-Shot Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Tao Xiang,et al. Few-Shot Learning With Global Class Representations , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[49] Ioannis Patras,et al. TARN: Temporal Attentive Relation Network for Few-Shot and Zero-Shot Action Recognition , 2019, BMVC.
[50] Juan Carlos Niebles,et al. Few-Shot Video Classification via Temporal Alignment , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Sung Whan Yoon,et al. TapNet: Neural Network Augmented with Task-Adaptive Projection for Few-Shot Learning , 2019, ICML.
[52] Yu-Wing Tai,et al. Memory-Attended Recurrent Network for Video Captioning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.
[54] Jing Zhang,et al. Few-Shot Learning via Saliency-Guided Hallucination of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Fei Sha,et al. Few-Shot Learning via Embedding Adaptation With Set-to-Set Functions , 2018, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[56] Chuang Gan,et al. TSM: Temporal Shift Module for Efficient Video Understanding , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[57] Yi Yang,et al. Compound Memory Networks for Few-Shot Video Classification , 2018, ECCV.
[58] Mubarak Shah,et al. Task Agnostic Meta-Learning for Few-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Wei Liu,et al. Reconstruction Network for Video Captioning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[60] Bolei Zhou,et al. Temporal Relational Reasoning in Videos , 2017, ECCV.
[61] Tao Xiang,et al. Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[62] Hang Li,et al. Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.
[63] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[64] Susanne Westphal,et al. The “Something Something” Video Database for Learning and Evaluating Visual Common Sense , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[65] Wei Shen,et al. Few-Shot Image Recognition by Predicting Parameters from Activations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[66] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[67] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.
[68] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[69] Tao Mei,et al. Video Captioning with Transferred Semantic Attributes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[70] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.
[71] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[72] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.
[73] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.
[74] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[75] Yoshua Bengio,et al. On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.
[76] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[77] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[78] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[79] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[80] Tal Hassner,et al. One Shot Similarity Metric Learning for Action Recognition , 2011, SIMBAD.
[81] Prateek Jain,et al. Far-sighted active learning on a budget for image and video recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[82] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[83] Meinard Müller,et al. Information retrieval for music and motion , 2007 .
[84] Pietro Perona,et al. One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[85] Qin Jin,et al. Few-Shot Action Recognition with Hierarchical Matching and Contrastive Learning , 2022, ECCV.
[86] Wai Keen Vong,et al. Few-shot image classification by generating natural language rules , 2022 .
[87] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[88] Ramprasaath R. Selvaraju,et al. Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization , 2016 .
[89] Alan Bundy,et al. Dynamic Time Warping , 1984 .