Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
暂无分享,去创建一个
Xiyang Dai | Michael Zeng | Luowei Zhou | Nguyen Bach | Lu Yuan | Yujia Xie | Ce Liu
[1] Dani Yogatama,et al. Language Models Can See: Plugging Visual Controls in Text Generation , 2022, ArXiv.
[2] Zirui Wang,et al. CoCa: Contrastive Captioners are Image-Text Foundation Models , 2022, Trans. Mach. Learn. Res..
[3] Li Fei-Fei,et al. Searching for Computer Vision North Stars , 2022, Daedalus.
[4] Oriol Vinyals,et al. Flamingo: a Visual Language Model for Few-Shot Learning , 2022, ArXiv.
[5] Tristan Thrush,et al. Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Adrian S. Wong,et al. Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language , 2022, ICLR.
[7] Jingren Zhou,et al. OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework , 2022, ICML.
[8] S. Hoi,et al. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation , 2022, ICML.
[9] Marcus Rohrbach,et al. FLAVA: A Foundational Language And Vision Alignment Model , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Xizhou Zhu,et al. Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Zhe Gan,et al. An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA , 2021, AAAI.
[12] Adams Wei Yu,et al. SimVLM: Simple Visual Language Model Pretraining with Weak Supervision , 2021, ICLR.
[13] Mingyuan Zhou,et al. Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph Captioning , 2021, International Journal of Computer Vision.
[14] Mohamed Elhoseiny,et al. VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Zhanyu Ma,et al. S2TD: A Tree-Structured Decoder for Image Paragraph Captioning , 2021, MMAsia.
[16] Lu Yuan,et al. Florence: A New Foundation Model for Computer Vision , 2021, ArXiv.
[17] Lu Yuan,et al. Dynamic Head: Unifying Object Detection Heads with Attentions , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[19] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[20] Jaemin Cho,et al. Unifying Vision-and-Language Tasks via Text Generation , 2021, ICML.
[21] Mona T. Diab,et al. Detecting Hallucinated Content in Conditional Neural Sequence Generation , 2020, FINDINGS.
[22] Chengming Li,et al. Interactive Key-Value Memory-augmented Attention for Image Paragraph Captioning , 2020, COLING.
[23] Naman K. Gupta,et al. ultralytics/yolov5: v3.1 - Bug Fixes and Performance Improvements , 2020 .
[24] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[25] Ryan McDonald,et al. On Faithfulness and Factuality in Abstractive Summarization , 2020, ACL.
[26] Jason J. Corso,et al. Unified Vision-Language Pre-Training for Image Captioning and VQA , 2019, AAAI.
[27] Tao Mei,et al. Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation , 2019, IJCAI.
[28] Zi Huang,et al. Curiosity-driven Reinforcement Learning for Diverse Visual Paragraph Generation , 2019, ACM Multimedia.
[29] Ali Farhadi,et al. OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Jieping Ye,et al. Object Detection in 20 Years: A Survey , 2019, Proceedings of the IEEE.
[31] Christopher D. Manning,et al. GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Amaia Salvador,et al. Inverse Cooking: Recipe Generation From Food Images , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2016, International Journal of Computer Vision.
[34] Alexander G. Schwing,et al. Diverse and Coherent Paragraph Generation from Images , 2018, ECCV.
[35] Masatoshi Yoshikawa,et al. Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training , 2018, ACM Multimedia.
[36] Alexander M. Rush,et al. Training for Diversity in Image Paragraph Captioning , 2018, EMNLP.
[37] Xiaoyu Shen,et al. DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset , 2017, IJCNLP.
[38] Chuang Gan,et al. Recurrent Topic-Transition GAN for Visual Paragraph Generation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[39] Sanja Fidler,et al. Towards Diverse and Natural Image Descriptions via a Conditional GAN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[40] Jonathan Krause,et al. A Hierarchical Approach for Generating Descriptive Image Paragraphs , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Basura Fernando,et al. SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.
[42] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[43] Francis Ferraro,et al. Visual Storytelling , 2016, NAACL.
[44] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[45] Li Fei-Fei,et al. Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval , 2015, VL@EMNLP.
[46] Michael S. Bernstein,et al. Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[48] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.
[50] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[51] Dan Klein,et al. Accurate Unlexicalized Parsing , 2003, ACL.
[52] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[53] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[54] R. Weale. Vision. A Computational Investigation Into the Human Representation and Processing of Visual Information. David Marr , 1983 .