论文信息 - Open-Vocabulary DETR with Conditional Matching - 字舞流文

Open-Vocabulary DETR with Conditional Matching

Chen Change Loy | Wei Li | Chen Huang | Kaiyang Zhou | Yuhang Zang

[1] Kaiyang Zhou,et al. Neural Prompt Search , 2022, ArXiv.

[2] Miaojing Shi,et al. Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Chen Change Loy,et al. Conditional Prompt Learning for Vision-Language Models , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Armand Joulin,et al. Detecting Twenty-thousand Classes using Image-level Supervision , 2022, ECCV.

[5] Chen Change Loy,et al. Learning to Prompt for Vision-Language Models , 2021, International Journal of Computer Vision.

[6] Dahun Kim,et al. Learning Open-World Object Proposals without Learning to Classify , 2021, IEEE Robotics and Automation Letters.

[7] Yin Cui,et al. Open-vocabulary Object Detection via Vision and Language Knowledge Distillation , 2021, ICLR.

[8] Depu Meng,et al. Conditional DETR for Fast Training Convergence , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9] Yann LeCun,et al. MDETR - Modulated Detection for End-to-End Multi-Modal Understanding , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[10] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[11] Peng Gao,et al. Fast Convergence of DETR with Spatially Modulated Co-Attention , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12] Shih-Fu Chang,et al. Open-Vocabulary Object Detection Using Captions , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Zhigang Dai,et al. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] L. Sigal,et al. An Improved Attention for Visual Question Answering , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15] Bin Li,et al. Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[16] Junnan Li,et al. Towards Open Vocabulary Object Detection without Human-provided Bounding Boxes , 2021, ArXiv.

[17] Shuai Zheng,et al. ZSD-YOLO: Zero-Shot YOLO Detection using Vision-Language KnowledgeDistillation , 2021, ArXiv.

[18] Han Fang,et al. Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.

[19] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.

[20] Liu Yang,et al. Sparse Sinkhorn Attention , 2020, ICML.

[21] Changxin Gao,et al. GTNet: Generative Transfer Network for Zero-Shot Object Detection , 2020, AAAI.

[22] Venkatesh Saligrama,et al. Don’t Even Look Once: Synthesizing Features for Zero-Shot Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[25] Lina Yao,et al. Zero-Shot Object Detection with Textual Descriptions , 2019, AAAI.

[26] Ross B. Girshick,et al. LVIS: A Dataset for Large Vocabulary Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Silvio Savarese,et al. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[29] Qi Wu,et al. Visual Grounding via Accumulated Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30] Rama Chellappa,et al. Zero-Shot Object Detection , 2018, ECCV.

[31] Ramakant Nevatia,et al. Query-Guided Regression Network with Context Policy for Phrase Grounding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32] Xiaogang Wang,et al. Person Search with Natural Language Description , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[36] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[37] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[38] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[39] Dumitru Erhan,et al. Deep Neural Networks for Object Detection , 2013, NIPS.

[40] Harold W. Kuhn,et al. The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[41] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[42] David A. McAllester,et al. A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[43] Tomaso A. Poggio,et al. A Trainable System for Object Detection , 2000, International Journal of Computer Vision.