ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing
暂无分享,去创建一个
[1] M. Zhang,et al. A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond , 2022, IEEE transactions on pattern analysis and machine intelligence.
[2] M. Abdar,et al. A Review of Generalized Zero-Shot Learning Methods , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[3] Qi Wu,et al. Learning Distinct and Representative Modes for Image Captioning , 2022, NeurIPS.
[4] Takayuki Okatani,et al. GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features , 2022, European Conference on Computer Vision.
[5] Z. Kira,et al. Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Dani Yogatama,et al. Language Models Can See: Plugging Visual Controls in Text Generation , 2022, ArXiv.
[7] Li Dong,et al. CLIP Models are Few-Shot Learners: Empirical Studies on VQA and Visual Entailment , 2022, ACL.
[8] Xiaowei Hu,et al. Injecting Semantic Concepts into End-to-End Image Captioning , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Lior Wolf,et al. ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Xiaowei Hu,et al. Scaling Up Vision-Language Pretraining for Image Captioning , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Daniel Keysers,et al. LiT: Zero-Shot Transfer with Locked-image text Tuning , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Kurt Keutzer,et al. How Much Can CLIP Benefit Vision-and-Language Tasks? , 2021, ICLR.
[13] Ron Mokady,et al. ClipCap: CLIP Prefix for Image Captioning , 2021, ArXiv.
[14] Yejin Choi,et al. VinVL: Revisiting Visual Representations in Vision-Language Models , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Lijuan Wang,et al. VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning , 2021, AAAI.
[16] Ronan Le Bras,et al. CLIPScore: A Reference-free Evaluation Metric for Image Captioning , 2021, EMNLP.
[17] Lu Yuan,et al. Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[18] Wei Liu,et al. Human-like Controllable Image Captioning with Verb-specific Semantic Roles , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Alec Radford,et al. Multimodal Neurons in Artificial Neural Networks , 2021 .
[20] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[21] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[22] John D. Kelleher,et al. Language-Driven Region Pointer Advancement for Controllable Image Captioning , 2020, COLING.
[23] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[24] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[25] Wentian Zhao,et al. MemCap: Memorizing Style Knowledge for Image Captioning , 2020, AAAI.
[26] Tao Mei,et al. X-Linear Attention Networks for Image Captioning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Limin Wang,et al. SketchyCOCO: Image Generation From Freehand Scene Sketches , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Peng Wang,et al. Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Hongtao Lu,et al. Show, Recall, and Tell: Image Captioning with Recall Mechanism , 2020, AAAI.
[30] Marcella Cornia,et al. Meshed-Memory Transformer for Image Captioning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Jason J. Corso,et al. Unified Vision-Language Pre-Training for Image Captioning and VQA , 2019, AAAI.
[32] Yi Yang,et al. Cascaded Revision Network for Novel Object Captioning , 2019, IEEE Transactions on Circuits and Systems for Video Technology.
[33] Chun-Xia Zhang,et al. Improving text classification with weighted word embeddings via a multi-channel TextCNN model , 2019, Neurocomputing.
[34] Wenmin Wang,et al. Adaptively Aligned Image Captioning via Adaptive Attention Time , 2019, NeurIPS.
[35] Sebastian Riedel,et al. Language Models as Knowledge Bases? , 2019, EMNLP.
[36] Dhruv Batra,et al. Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[37] Jie Chen,et al. Attention on Attention for Image Captioning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[38] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[39] Hongtao Lu,et al. Look Back and Predict Forward in Image Captioning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Hanqing Lu,et al. MSCap: Multi-Style Image Captioning With Unpaired Stylized Text , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Tao Mei,et al. Pointing Novel Objects in Image Captioning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Tamir Hazan,et al. A Simple Baseline for Audio-Visual Scene-Aware Dialog , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Antoni B. Chan,et al. Describing Like Humans: On Diversity in Image Captioning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Tae-Hyun Oh,et al. Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Alex Wang,et al. BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model , 2019, Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation.
[46] Jianfei Cai,et al. Auto-Encoding Scene Graphs for Image Captioning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Rita Cucchiara,et al. Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Jason Weston,et al. Engaging Image Captioning via Personality , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Alexander Schwing,et al. Fast, Diverse and Accurate Image Captioning Guided by Part-Of-Speech , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[51] Xinlei Chen,et al. nocaps: novel object captioning at scale , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[52] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[53] Tao Mei,et al. Exploring Visual Relationship for Image Captioning , 2018, ECCV.
[54] Lexing Xie,et al. SemStyle: Learning to Generate Stylised Image Captions Using Unaligned Text , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[55] Ashwin K. Vijayakumar,et al. Diverse Beam Search for Improved Description of Complex Scenes , 2018, AAAI.
[56] Rama Chellappa,et al. Zero-Shot Object Detection , 2018, ECCV.
[57] Jianwei Yang,et al. Neural Baby Talk , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[58] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[59] Svetlana Lazebnik,et al. Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space , 2017, NIPS.
[60] Zhe Gan,et al. StyleNet: Generating Attractive Visual Captions with Styles , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[61] Gunhee Kim,et al. Attend to You: Personalized Image Captioning with Context Sequence Memory Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[62] Gang Wang,et al. An Empirical Study of Language CNN for Image Captioning , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[63] Basura Fernando,et al. Guided Open Vocabulary Image Captioning with Constrained Beam Search , 2016, EMNLP.
[64] Zhe Gan,et al. Semantic Compositional Networks for Visual Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[65] Trevor Darrell,et al. Captioning Images with Diverse Objects , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[66] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[67] Basura Fernando,et al. SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.
[68] Trevor Darrell,et al. Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[69] Lexing Xie,et al. SentiCap: Generating Image Descriptions with Sentiments , 2015, AAAI.
[70] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[71] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[72] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[73] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[74] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[75] Andrea Esuli,et al. SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.
[76] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[77] Jean Carletta,et al. Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization , 2005, ACL 2005.
[78] Ted Pedersen,et al. WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.
[79] Steven Bird,et al. NLTK: The Natural Language Toolkit , 2002, ACL.
[80] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[81] G. Casella,et al. Explaining the Gibbs Sampler , 1992 .