Caption Anything: Interactive Image Description with Diverse Multimodal Controls
暂无分享,去创建一个
Ying Shan | Yixiao Ge | Teng Wang | Shanshan Zhao | Mingqi Gao | Zhe Li | Junjie Fei | Hao Zheng | Jinrui Zhang | Yun-Qiu Tang | Feng Zheng
[1] Ying Shan,et al. Accelerating Vision-Language Pretraining with Free Language Modeling , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Henrique Pondé de Oliveira Pinto,et al. GPT-4 Technical Report , 2023, 2303.08774.
[3] Chenfei Wu,et al. Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models , 2023, ArXiv.
[4] Zhengjue Wang,et al. ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Naman Goyal,et al. LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.
[6] S. Savarese,et al. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models , 2023, ArXiv.
[7] Jiahao Xie,et al. Controllable Image Captioning via Prompting , 2022, AAAI.
[8] Zhe Gan,et al. GRiT: A Generative Region-to-text Transformer for Object Understanding , 2022, ArXiv.
[9] Guillem Cucurull,et al. Galactica: A Large Language Model for Science , 2022, ArXiv.
[10] Alexander M. Rush,et al. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model , 2022, ArXiv.
[11] Dragomir R. Radev,et al. Crosslingual Generalization through Multitask Finetuning , 2022, ArXiv.
[12] Andrew M. Dai,et al. Scaling Instruction-Finetuned Language Models , 2022, ArXiv.
[13] Q. Tian,et al. DeeCap: Dynamic Early Exiting for Efficient Image Captioning , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Ting Yao,et al. Comprehending and Ordering Semantics for Image Captioning , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Zhe Gan,et al. GIT: A Generative Image-to-text Transformer for Vision and Language , 2022, Trans. Mach. Learn. Res..
[16] S. Gu,et al. Large Language Models are Zero-Shot Reasoners , 2022, NeurIPS.
[17] Xi Victoria Lin,et al. OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.
[18] Kurt Debattista,et al. Region-Object Relation-Aware Dense Captioning via Transformer. , 2022, IEEE transactions on neural networks and learning systems.
[19] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[20] S. Hoi,et al. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation , 2022, ICML.
[21] Dale Schuurmans,et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.
[22] Xiaowei Hu,et al. Injecting Semantic Concepts into End-to-End Image Captioning , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Xiaowei Hu,et al. Scaling Up Vision-Language Pretraining for Image Captioning , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Wei Liu,et al. Human-like Controllable Image Captioning with Verb-specific Semantic Roles , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Nan Duan,et al. Control Image Captioning Spatially and Temporally , 2021, ACL.
[26] Ning Ding,et al. Length-Controllable Image Captioning , 2020, ECCV.
[27] Zhao Zhang,et al. Interactive Image Segmentation With First Click Attention , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[29] Wentian Zhao,et al. MemCap: Memorizing Style Knowledge for Image Captioning , 2020, AAAI.
[30] Ilia Petrov,et al. F-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Jordi Pont-Tuset,et al. Connecting Vision and Language with Localized Narratives , 2019, ECCV.
[32] Jie Chen,et al. Attention on Attention for Image Captioning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[33] Jungong Han,et al. Learning Object Context for Dense Captioning , 2019, AAAI.
[34] Nenghai Yu,et al. Context and Attribute Grounded Dense Captioning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Rita Cucchiara,et al. Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Sébastien Ourselin,et al. DeepIGeoS: A Deep Interactive Geodesic Framework for Medical Image Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[37] Zhuwen Li,et al. Interactive Image Segmentation with Latent Diversity , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[38] Bastian Leibe,et al. Iteratively Trained Interactive Segmentation , 2018, BMVC.
[39] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[40] Sim Heng Ong,et al. Regional Interactive Image Segmentation Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[41] Zhe Gan,et al. StyleNet: Generating Attractive Visual Captions with Styles , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Li-Jia Li,et al. Dense Captioning with Joint Inference and Visual Context , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Ning Xu,et al. Deep Interactive Object Selection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Li Fei-Fei,et al. DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Lexing Xie,et al. SentiCap: Generating Image Descriptions with Sentiments , 2015, AAAI.
[46] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[47] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).