Reveal: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
暂无分享,去创建一个
David A. Ross | C. Schmid | Chen Sun | Kai-Wei Chang | A. Fathi | Yizhou Sun | Ziniu Hu | Zirui Wang | Ahmet Iscen
[1] Kai-Wei Chang,et al. Empowering Language Models with Knowledge Graph Reasoning for Open-Domain Question Answering , 2022, EMNLP.
[2] Ashish V. Thapliyal,et al. PaLI: A Jointly-Scaled Multilingual Language-Image Model , 2022, arXiv.org.
[3] Li Dong,et al. Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks , 2022, ArXiv.
[4] Jane A. Yu,et al. Few-shot Learning with Retrieval Augmented Language Models , 2022, J. Mach. Learn. Res..
[5] Blake A. Hechtman,et al. TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s , 2022, NeurIPS.
[6] Dustin Schwenk,et al. A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge , 2022, ECCV.
[7] Lu Yuan,et al. REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering , 2022, NeurIPS.
[8] J. Leskovec,et al. VQA-GNN: Reasoning with Multimodal Semantic Graph for Visual Question Answering , 2022, ArXiv.
[9] Ed H. Chi,et al. Improving Multi-Task Generalization via Regularizing Spurious Correlation , 2022, NeurIPS.
[10] Zirui Wang,et al. CoCa: Contrastive Captioners are Image-Text Foundation Models , 2022, Trans. Mach. Learn. Res..
[11] Oriol Vinyals,et al. Flamingo: a Visual Language Model for Few-Shot Learning , 2022, NeurIPS.
[12] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..
[13] Chunhua Shen,et al. Retrieval Augmented Classification for Long-Tail Visual Recognition , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Derek Hoiem,et al. Webly Supervised Concept Expansion for General Purpose Vision Models , 2022, ECCV.
[15] Yonatan Bisk,et al. KAT: A Knowledge Augmented Transformer for Vision-and-Language , 2021, NAACL.
[16] Daniel Keysers,et al. LiT: Zero-Shot Transfer with Locked-image text Tuning , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Zhe Gan,et al. An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA , 2021, AAAI.
[18] Adams Wei Yu,et al. SimVLM: Simple Visual Language Model Pretraining with Weak Supervision , 2021, ICLR.
[19] Alexander Kolesnikov,et al. Scaling Vision Transformers , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Roozbeh Mottaghi,et al. Multi-Modal Answer Validation for Knowledge-Based VQA , 2021, AAAI.
[21] Ron Mokady,et al. ClipCap: CLIP Prefix for Image Captioning , 2021, ArXiv.
[22] Pratyay Banerjee,et al. Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering , 2021, EMNLP.
[23] Dani Yogatama,et al. End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering , 2021, NeurIPS.
[24] Yejin Choi,et al. VinVL: Revisiting Visual Representations in Vision-Language Models , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Andrew Zisserman,et al. Perceiver: General Perception with Iterative Attention , 2021, ICML.
[26] Jiecao Chen,et al. WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning , 2021, SIGIR.
[27] Radu Soricut,et al. Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[29] Marcus Rohrbach,et al. KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Edouard Grave,et al. Distilling Knowledge from Reader to Retriever for Question Answering , 2020, ICLR.
[31] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[32] François Gardères,et al. ConceptBert: Concept-Aware Representation for Visual Question Answering , 2020, FINDINGS.
[33] Fabio Petroni,et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , 2020, NeurIPS.
[34] Danqi Chen,et al. Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.
[35] Ming-Wei Chang,et al. REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.
[36] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[37] Dmytro Okhonko,et al. Unified Open-Domain Question Answering with Structured and Unstructured Knowledge , 2020, ArXiv.
[38] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[39] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[40] Ali Farhadi,et al. OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Jimmy J. Lin,et al. End-to-End Open-Domain Question Answering with BERTserini , 2019, NAACL.
[42] Xinlei Chen,et al. nocaps: novel object captioning at scale , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[43] Svetlana Lazebnik,et al. Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering , 2018, NeurIPS.
[44] Qi Wu,et al. FVQA: Fact-Based Visual Question Answering , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[45] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2016, International Journal of Computer Vision.
[46] Chunhua Shen,et al. Explicit Knowledge-based Reasoning for Visual Question Answering , 2015, IJCAI.
[47] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[48] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[49] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Markus Krötzsch,et al. Wikidata , 2014, Commun. ACM.
[51] Ping Li,et al. Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) , 2014, NIPS.