Are metrics measuring what they should? An evaluation of image captioning task metrics
暂无分享,去创建一个
Tania A. Ramirez-delreal | D. Moctezuma | Oth'on Gonz'alez-Ch'avez | Guillermo Ruiz | Daniela Moctezuma
[1] Xiaowei Hu,et al. Scaling Up Vision-Language Pretraining for Image Captioning , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Rita Cucchiara,et al. From Show to Tell: A Survey on Deep Learning-Based Image Captioning , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[3] Ronan Le Bras,et al. CLIPScore: A Reference-free Evaluation Metric for Image Captioning , 2021, EMNLP.
[4] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[5] Gigliola Vaglini,et al. Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search , 2021, IMPROVE.
[6] Karen Spärck Jones. A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.
[7] Wei Liu,et al. CPTR: Full Transformer Network for Image Captioning , 2021, ArXiv.
[8] Lei Zhang,et al. VinVL: Making Visual Representations Matter in Vision-Language Models , 2021, ArXiv.
[9] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[10] Mitesh M. Khapra,et al. A Survey of Evaluation Metrics Used for NLG Systems , 2020, ACM Comput. Surv..
[11] Justin Johnson,et al. VirTex: Learning Visual Representations from Textual Annotations , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[13] Tao Mei,et al. X-Linear Attention Networks for Image Captioning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Hanwang Zhang,et al. Visual Commonsense R-CNN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Marcella Cornia,et al. Meshed-Memory Transformer for Image Captioning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Zhe Gan,et al. TIGEr: Text-to-Image Grounding for Image Caption Evaluation , 2019, EMNLP.
[17] Yu-Wing Tai,et al. Reflective Decoding Network for Image Captioning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[18] Alec Radford,et al. Release Strategies and the Social Impacts of Language Models , 2019, ArXiv.
[19] Jie Chen,et al. Attention on Attention for Image Captioning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[20] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[21] Yale Song,et al. Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Xuanjing Huang,et al. How to Fine-Tune BERT for Text Classification? , 2019, CCL.
[23] Kilian Q. Weinberger,et al. BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.
[24] Albert Gatt,et al. Pre-gen metrics: Predicting caption quality metrics without generating captions , 2018, ECCV Workshops.
[25] Ehud Reiter,et al. A Structured Review of the Validity of BLEU , 2018, CL.
[26] Anastasia Shimorina,et al. Human vs Automatic Metrics: on the Importance of Correlation Design , 2018, ArXiv.
[27] Christoph Meinel,et al. Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning , 2018, ACM Trans. Multim. Comput. Commun. Appl..
[28] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.
[29] Xi Chen,et al. Stacked Cross Attention for Image-Text Matching , 2018, ECCV.
[30] Alexander G. Schwing,et al. Convolutional Image Captioning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[31] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[32] Verena Rieser,et al. Why We Need New Evaluation Metrics for NLG , 2017, EMNLP.
[33] Feng Liu,et al. Actor-Critic Sequence Training for Image Captioning , 2017, ArXiv.
[34] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[35] Ning Zhang,et al. Deep Reinforcement Learning-Based Image Captioning with Embedding Reward , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Mert Kilickaya,et al. Re-evaluating Automatic Metrics for Image Captioning , 2016, EACL.
[37] Cordelia Schmid,et al. Areas of Attention for Image Captioning , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[38] Vaibhava Goel,et al. Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Tat-Seng Chua,et al. SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Yusuke Sugano,et al. Seeing with Humans: Gaze-Assisted Neural Image Captioning , 2016, ArXiv.
[41] Basura Fernando,et al. SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.
[42] Chenxi Liu,et al. Attention Correctness in Neural Image Captioning , 2016, AAAI.
[43] Ye Yuan,et al. Review Networks for Caption Generation , 2016, NIPS.
[44] Jian Sun,et al. Rich Image Captioning in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[45] Joelle Pineau,et al. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.
[46] Jiebo Luo,et al. Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Nazli Ikizler-Cinbis,et al. Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures (Extended Abstract) , 2017, IJCAI.
[48] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.
[49] Alan L. Yuille,et al. Generation and Comprehension of Unambiguous Object Descriptions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Li Fei-Fei,et al. Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval , 2015, VL@EMNLP.
[51] Xinlei Chen,et al. Mind's eye: A recurrent visual representation for image caption generation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[53] Wei Xu,et al. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.
[54] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[55] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[56] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[58] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Svetlana Lazebnik,et al. Improving Image-Sentence Embeddings Using Large Weakly Annotated Photo Collections , 2014, ECCV.
[60] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[61] Ruslan Salakhutdinov,et al. Multimodal Neural Language Models , 2014, ICML.
[62] Micha Elsner,et al. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , 2014 .
[63] Frank Keller,et al. Comparing Automatic Evaluation Measures for Image Description , 2014, ACL.
[64] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[65] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[66] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[67] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[68] Karl Stratos,et al. Midge: Generating Image Descriptions From Computer Vision Detections , 2012, EACL.
[69] Cyrus Rashtchian,et al. Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.
[70] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.
[71] Philipp Koehn,et al. Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.
[72] Chin-Yew Lin,et al. ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.
[73] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[74] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[75] George A. Miller,et al. Introduction to WordNet: An On-line Lexical Database , 1990 .
[76] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[77] Rita Cucchiara,et al. From Show to Tell: A Survey on Image Captioning , 2021, ArXiv.
[78] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[79] Ronald A. Rensink. The Dynamic Representation of Scenes , 2000 .