ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
暂无分享,去创建一个
Yong Jae Lee | Liunian Harold Li | Jianfeng Gao | Pengchuan Zhang | Chengkun Li | Zicheng Liu | Jianwei Yang | Houdong Hu | J. Aneja | Haotian Liu | Ping Jin | Chunyuan Li
[1] Trevor Darrell,et al. K-LITE: Learning Transferable Visual Models with External Knowledge , 2022, NeurIPS.
[2] Jianfeng Gao,et al. Unified Contrastive Learning in Image-Text-Label Space , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Serge J. Belongie,et al. Visual Prompt Tuning , 2022, ECCV.
[4] Chen Change Loy,et al. Conditional Prompt Learning for Vision-Language Models , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Ping Luo,et al. Context Autoencoder for Self-Supervised Representation Learning , 2022, ArXiv.
[6] Siva Reddy,et al. IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages , 2022, ICML.
[7] Saining Xie,et al. SLIP: Self-supervision meets Language-Image Pre-training , 2021, ECCV.
[8] Lu Yuan,et al. RegionCLIP: Region-based Language-Image Pretraining , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Mohit Bansal,et al. VL-ADAPTER: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Liunian Harold Li,et al. Grounded Language-Image Pre-training , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Ross B. Girshick,et al. Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Zhenguo Li,et al. FILIP: Fine-grained Interactive Language-Image Pre-Training , 2021, ICLR.
[13] Alexander M. Rush,et al. Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.
[14] Junjie Yan,et al. Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm , 2021, ICLR.
[15] Hiroaki Hayashi,et al. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing , 2021, ACM Comput. Surv..
[16] Lu Yuan,et al. Florence: A New Foundation Model for Computer Vision , 2021, ArXiv.
[17] Peng Gao,et al. CLIP-Adapter: Better Vision-Language Models with Feature Adapters , 2021, Int. J. Comput. Vis..
[18] Maosong Sun,et al. CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models , 2021, ArXiv.
[19] Junnan Li,et al. Align before Fuse: Vision and Language Representation Learning with Momentum Distillation , 2021, NeurIPS.
[20] Zhe Gan,et al. VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation , 2021, NeurIPS Datasets and Benchmarks.
[21] Lu Yuan,et al. Dynamic Head: Unifying Object Detection Heads with Attentions , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Frank Hutter,et al. DEHB: Evolutionary Hyberband for Scalable, Robust and Efficient Hyperparameter Optimization , 2021, IJCAI.
[23] Saining Xie,et al. An Empirical Study of Training Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[24] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[25] Radu Soricut,et al. Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[27] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[28] Shih-Fu Chang,et al. Open-Vocabulary Object Detection Using Captions , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[30] Nicola De Cao,et al. KILT: a Benchmark for Knowledge Intensive Language Tasks , 2020, NAACL.
[31] David Lopez-Paz,et al. In Search of Lost Domain Generalization , 2020, ICLR.
[32] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[33] Xiuye Gu,et al. Zero-Shot Detection via Vision and Language Knowledge Distillation , 2021, ArXiv.
[34] Percy Liang,et al. Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.
[35] Mohit Bansal,et al. Vokenization: Improving Language Understanding via Contextualized, Visually-Grounded Supervision , 2020, EMNLP.
[36] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[37] Douwe Kiela,et al. The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes , 2020, NeurIPS.
[38] Trevor Darrell,et al. Frustratingly Simple Few-Shot Object Detection , 2020, ICML.
[39] Naman Jain,et al. PlantDoc: A Dataset for Visual Plant Disease Detection , 2019, COMAD/CODS.
[40] André Susano Pinto,et al. A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark , 2019, 1910.04867.
[41] Jian Sun,et al. Objects365: A Large-Scale, High-Quality Dataset for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[42] Thomas B. Moeslund,et al. Detection of Marine Animals in a New Underwater Dataset with Varying Visibility , 2019, CVPR Workshops.
[43] Ross B. Girshick,et al. LVIS: A Dataset for Large Vocabulary Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Ali Farhadi,et al. OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[46] Nuno Vasconcelos,et al. Towards Universal Object Detection by Domain Attention , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[48] Bo Wang,et al. Moment Matching for Multi-Source Domain Adaptation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[49] Andreas Dengel,et al. EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.
[50] Christoph H. Lampert,et al. Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[51] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[52] Aaron Klein,et al. BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.
[53] Radu Soricut,et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.
[54] Max Welling,et al. Rotation Equivariant CNNs for Digital Pathology , 2018, MICCAI.
[55] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[56] Zicheng Liu,et al. Reinforced Temporal Attention and Split-Rate Transfer for Depth-Based Person Re-identification , 2017, ECCV.
[57] Xiaoqiang Lu,et al. Remote Sensing Image Scene Classification: Benchmark and State of the Art , 2017, Proceedings of the IEEE.
[58] Allan Jabri,et al. Learning Visual N-Grams from Web Data , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[59] Yanwei Fu,et al. Semi-supervised Vocabulary-Informed Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[60] David A. Shamma,et al. YFCC100M , 2015, Commun. ACM.
[61] Stefan Lee,et al. Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[62] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, International Journal of Computer Vision.
[63] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[64] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[65] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[66] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.
[67] Matthieu Guillaumin,et al. Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.
[68] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[69] Christoph H. Lampert,et al. Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[70] Trevor Darrell,et al. Recognizing Image Style , 2013, BMVC.
[71] Iasonas Kokkinos,et al. Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[72] Jonathan Krause,et al. 3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.
[73] Jannik Fritsch,et al. A new performance measure and evaluation benchmark for road detection algorithms , 2013, 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013).
[74] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..
[75] Subhransu Maji,et al. Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.
[76] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.
[77] Iryna Gurevych,et al. Wiktionary: a new rival for expert-built lexicons? Exploring the possibilities of collaborative lexicography , 2012 .
[78] L. Deng,et al. The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.
[79] C. V. Jawahar,et al. Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[80] James Hays,et al. SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[81] Johannes Stallkamp,et al. The German Traffic Sign Recognition Benchmark: A multi-class classification competition , 2011, The 2011 International Joint Conference on Neural Networks.
[82] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .
[83] Bernt Schiele,et al. Evaluating knowledge transfer and zero-shot learning in a large-scale setting , 2011, CVPR 2011.
[84] Richard Szeliski,et al. Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.
[85] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .
[86] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.
[87] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[88] Christoph H. Lampert,et al. Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[89] Ali Farhadi,et al. Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[90] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[91] Andrew Zisserman,et al. Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.
[92] Jianguo Zhang,et al. The PASCAL Visual Object Classes Challenge , 2006 .
[93] Pietro Perona,et al. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.
[94] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.