Cross-Domain Image Captioning with Discriminative Finetuning
暂无分享,去创建一个
Nathanaël Carraz Rakotonirina | Marco Baroni | Francesca Franzon | F. Franzon | Michele Bevilacqua | Roberto Dessì | Eleonora Gualdoni
[1] André Susano Pinto,et al. Tuning computer vision models with task rewards , 2023, ICML.
[2] Rita Cucchiara,et al. From Show to Tell: A Survey on Deep Learning-Based Image Captioning , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[3] Mitesh M. Khapra,et al. A Survey of Evaluation Metrics Used for NLG Systems , 2020, ACM Comput. Surv..
[4] Marco Baroni,et al. Communication breakdown: On the low mutual intelligibility between human and neural captioning , 2022, EMNLP.
[5] S. Savarese,et al. LAVIS: A Library for Language-Vision Intelligence , 2022, ArXiv.
[6] Mohit Bansal,et al. Fine-grained Image Captioning with CLIP Reward , 2022, NAACL-HLT.
[7] Ronan Le Bras,et al. Multimodal Knowledge Alignment with Reinforcement Learning , 2022, arXiv.org.
[8] Yash Goyal,et al. Image Retrieval from Contextual Descriptions , 2022, ACL.
[9] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[10] Marcella Cornia,et al. CaMEL: Mean Teacher Learning for Image Captioning , 2022, 2022 26th International Conference on Pattern Recognition (ICPR).
[11] S. Hoi,et al. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation , 2022, ICML.
[12] Noah D. Goodman,et al. Concadia: Towards Image-Based Text Generation with a Purpose , 2021, EMNLP.
[13] Jeff Wu,et al. WebGPT: Browser-assisted question-answering with human feedback , 2021, ArXiv.
[14] Ron Mokady,et al. ClipCap: CLIP Prefix for Image Captioning , 2021, ArXiv.
[15] Ronan Le Bras,et al. CLIPScore: A Reference-free Evaluation Metric for Image Captioning , 2021, EMNLP.
[16] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[17] Jiebo Luo,et al. Cross-Domain Image Captioning via Cross-Modal Retrieval and Model Adaptation , 2020, IEEE Transactions on Image Processing.
[18] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[19] Ming-Wei Chang,et al. CapWAP: Image Captioning with a Purpose , 2020, EMNLP.
[20] Ryan J. Lowe,et al. Learning to summarize from human feedback , 2020, NeurIPS 2020.
[21] Eugene Kharitonov,et al. EGG: a toolkit for research on Emergence of lanGuage in Games , 2019, EMNLP.
[22] J. Gray,et al. PsychoPy2: Experiments in behavior made easy , 2019, Behavior research methods.
[23] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[24] Xinlei Chen,et al. nocaps: novel object captioning at scale , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[25] Radu Soricut,et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.
[26] Xiaogang Wang,et al. Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data , 2018, ECCV.
[27] Gregory Shakhnarovich,et al. Discriminability Objective for Training Descriptive Captions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[28] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[29] Bo Dai,et al. Contrastive Learning for Image Captioning , 2017, NIPS.
[30] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[31] Min Sun,et al. Show, Adapt and Tell: Adversarial Training of Cross-Domain Image Captioner , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[32] Vaibhava Goel,et al. Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Nazli Ikizler-Cinbis,et al. Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures (Extended Abstract) , 2017, IJCAI.
[34] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Basura Fernando,et al. SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.
[36] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.
[37] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[38] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[39] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.
[43] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.
[44] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[45] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[46] Armin W. Schulz. Signals: evolution, learning, and information , 2012 .
[47] M. Guasti. How Children Learn the Meanings of Words , 2010 .
[48] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[49] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[50] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.