Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training
暂无分享,去创建一个
[1] S. Savarese,et al. LAVIS: A Library for Language-Vision Intelligence , 2022, ArXiv.
[2] D. Schuurmans,et al. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models , 2022, ICLR.
[3] Radu Soricut,et al. All You May Need for VQA are Image Captions , 2022, NAACL.
[4] Oriol Vinyals,et al. Flamingo: a Visual Language Model for Few-Shot Learning , 2022, NeurIPS.
[5] Stella Rose Biderman,et al. GPT-NeoX-20B: An Open-Source Autoregressive Language Model , 2022, BIGSCIENCE.
[6] Adrian S. Wong,et al. Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language , 2022, ICLR.
[7] Qun Liu,et al. Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation , 2022, FINDINGS.
[8] Hannaneh Hajishirzi,et al. UnifiedQA-v2: Stronger Generalization via Broader Cross-Format Training , 2022, ArXiv.
[9] S. Hoi,et al. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation , 2022, ICML.
[10] Swarat Chaudhuri,et al. Natural Language Deduction through Search over Statement Compositions , 2022, EMNLP.
[11] Mohit Bansal,et al. VL-ADAPTER: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] A. Frank,et al. MAGMA - Multimodal Augmentation of Generative Models through Adapter-based Finetuning , 2021, EMNLP.
[13] Hang Li,et al. Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts , 2021, ICML.
[14] Weizhu Chen,et al. A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models , 2021, ACL.
[15] Alexander M. Rush,et al. Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.
[16] Carrie J. Cai,et al. AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts , 2021, CHI.
[17] Zhe Gan,et al. An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA , 2021, AAAI.
[18] Quoc V. Le,et al. Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.
[19] Adams Wei Yu,et al. SimVLM: Simple Visual Language Model Pretraining with Weak Supervision , 2021, ICLR.
[20] Mohamed Elhoseiny,et al. VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Wai Keen Vong,et al. Few-shot image classification by generating natural language rules , 2022 .
[22] Michael S. Bernstein,et al. On the Opportunities and Risks of Foundation Models , 2021, ArXiv.
[23] Junnan Li,et al. Align before Fuse: Vision and Language Representation Learning with Momentum Distillation , 2021, NeurIPS.
[24] Oriol Vinyals,et al. Multimodal Few-Shot Learning with Frozen Language Models , 2021, NeurIPS.
[25] Kenneth Ward Church,et al. On Attention Redundancy: A Comprehensive Study , 2021, NAACL.
[26] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[27] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[28] Jaemin Cho,et al. Unifying Vision-and-Language Tasks via Text Generation , 2021, ICML.
[29] Hua Wu,et al. UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning , 2020, ACL.
[30] Tejas Gokhale,et al. WeaQA: Weak Supervision via Captions for Visual Question Answering , 2020, FINDINGS.
[31] Edouard Grave,et al. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering , 2020, EACL.
[32] A. Linear-probe,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021 .
[33] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[34] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[35] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[36] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[37] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.
[38] Joelle Pineau,et al. Language GANs Falling Short , 2018, ICLR.
[39] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[40] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.
[41] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[42] Ali Farhadi,et al. OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Christopher D. Manning,et al. GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[45] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2016, International Journal of Computer Vision.
[46] Yann Dauphin,et al. Hierarchical Neural Story Generation , 2018, ACL.
[47] Dan Klein,et al. Learning with Latent Language , 2017, NAACL.
[48] Yoav Goldberg,et al. Controlling Linguistic Style Aspects in Neural Language Generation , 2017, ArXiv.
[49] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[50] Ashwin K. Vijayakumar,et al. Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models , 2016, ArXiv.
[51] B T Thomas Yeo,et al. The modular and integrative functional architecture of the human brain , 2015, Proceedings of the National Academy of Sciences.
[52] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[53] S. Shettleworth. Modularity, comparative cognition and human uniqueness , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.
[54] J. Fodor. The Modularity of mind. An essay on faculty psychology , 1986 .