Multi-Modal Few-Shot Temporal Action Detection via Vision-Language Meta-Adaptation
暂无分享,去创建一个
Bernard Ghanem | Mengmeng Xu | Xiatian Zhu | Tao Xiang | Yi-Zhe Song | Sauradip Nag | Juan-Manuel Pérez-Rúa
[1] Xiatian Zhu,et al. Zero-Shot Temporal Action Detection via Vision-Language Prompting , 2022, ECCV.
[2] Xiatian Zhu,et al. Proposal-Free Temporal Action Detection via Global Segmentation Mask Learning , 2022, ECCV.
[3] Jiangliu Wang,et al. AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition , 2022, NeurIPS.
[4] Thomas Kipf,et al. Simple Open-Vocabulary Object Detection with Vision Transformers , 2022, ArXiv.
[5] Ramalingam Chellappa,et al. Multimodal Few-Shot Object Detection with Meta-Learning Based Cross-Modal Prompting , 2022, ArXiv.
[6] Chen Change Loy,et al. Open-Vocabulary DETR with Conditional Matching , 2022, ECCV.
[7] Chen Change Loy,et al. Conditional Prompt Learning for Vision-Language Models , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] A. Schwing,et al. Masked-attention Mask Transformer for Universal Image Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Chen Change Loy,et al. Learning to Prompt for Vision-Language Models , 2021, International Journal of Computer Vision.
[10] Chi-Keung Tang,et al. Few-Shot Video Object Detection , 2021, ECCV.
[11] Peng Gao,et al. Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling , 2021, ArXiv.
[12] Tao Xiang,et al. Few-Shot Temporal Action Localization with Query Adaptive Transformer , 2021, BMVC.
[13] Peng Gao,et al. CLIP-Adapter: Better Vision-Language Models with Feature Adapters , 2021, Int. J. Comput. Vis..
[14] Shih-Fu Chang,et al. Query Adaptive Few-Shot Object Detection with Heterogeneous Graph Convolutional Networks , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[15] Mengmeng Wang,et al. ActionCLIP: A New Paradigm for Video Action Recognition , 2021, ArXiv.
[16] Niamul Quader,et al. Class Semantics-based Attention for Action Detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[17] Alexander G. Schwing,et al. Per-Pixel Classification is Not All You Need for Semantic Segmentation , 2021, NeurIPS.
[18] Bernard Ghanem,et al. Low-Fidelity Video Encoder Optimization for Temporal Action Localization , 2021, NeurIPS.
[19] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[20] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[21] Limin Wang,et al. Relaxed Transformer Decoders for Direct Action Proposal Generation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[22] Bin Li,et al. Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.
[23] Amit K. Roy-Chowdhury,et al. Text-Based Localization of Moments in a Video Corpus , 2020, IEEE Transactions on Image Processing.
[24] Shijian Lu,et al. Meta-DETR: Few-Shot Object Detection via Unified Image-Level Meta-Learning , 2021, ArXiv.
[25] Cees G. M. Snoek,et al. Localizing the Common Action Among a Few Videos , 2020, ECCV.
[26] Xiyang Dai,et al. METAL: Minimum Effort Temporal Activity Localization in Untrimmed Videos , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Trevor Darrell,et al. Frustratingly Simple Few-Shot Object Detection , 2020, ICML.
[28] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[29] Andrew Zisserman,et al. End-to-End Learning of Visual Representations From Uncurated Instructional Videos , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Ali K. Thabet,et al. G-TAD: Sub-Graph Localization for Temporal Action Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Yu-Wing Tai,et al. Few-Shot Object Detection With Attention-RPN and Multi-Relation Detector , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Shilei Wen,et al. BMN: Boundary-Matching Network for Temporal Action Proposal Generation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[33] Deyu Meng,et al. Few-Example Object Detection with Model Communication , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[34] Yazan Abu Farha,et al. MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Xin Wang,et al. Few-Shot Object Detection via Feature Reweighting , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[36] Ming Yang,et al. BSN: Boundary Sensitive Network for Temporal Action Proposal Generation , 2018, ECCV.
[37] Fatih Murat Porikli,et al. One-Shot Action Localization by Learning Sequence Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[38] Tao Xiang,et al. Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[39] Bernard Ghanem,et al. SST: Single-Stream Temporal Action Proposals , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[41] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[42] Limin Wang,et al. Temporal Action Detection with Structured Segment Networks , 2017, International Journal of Computer Vision.
[43] Larry S. Davis,et al. Soft-NMS — Improving Object Detection with One Line of Code , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[44] Kate Saenko,et al. R-C3D: Region Convolutional 3D Network for Temporal Activity Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[45] R. Nevatia,et al. TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[46] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.
[47] Luc Van Gool,et al. UntrimmedNets for Weakly Supervised Action Recognition and Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Haroon Idrees,et al. The THUMOS challenge on action recognition for videos "in the wild" , 2016, Comput. Vis. Image Underst..
[49] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.
[50] Bernard Ghanem,et al. ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[52] Shaogang Gong,et al. Transductive Multi-View Zero-Shot Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[53] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.