论文信息 - A thorough review of models, evaluation metrics, and datasets on image captioning - 字舞流文

A thorough review of models, evaluation metrics, and datasets on image captioning

Can Zhao | Gaifang Luo | Lijun Cheng | Chao Jing | Guo-Zhu Song | Can Zhao | Gaifang Luo | Lijun Cheng | Chao Jing | Guozhu Song | Guo-Zhu Song

[1] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[2] Xirong Li,et al. Towards annotation-free evaluation of cross-lingual image captioning , 2020, MMAsia.

[3] Jürgen Schmidhuber,et al. Applying LSTM to Time Series Predictable Through Time-Window Approaches , 2001, WIRN.

[4] Jianfeng Gao,et al. Unified Vision-Language Pre-Training for Image Captioning and VQA , 2020, AAAI.

[5] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.

[6] Jun Xu,et al. Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training , 2020, ACM Multimedia.

[7] Wei Liu,et al. Recurrent Fusion Network for Image Captioning , 2018, ECCV.

[8] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10] Yuexian Zou,et al. Exploring Semantic Relationships for Unpaired Image Captioning , 2021, ArXiv.

[11] Koray Kavukcuoglu,et al. Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[12] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[13] Dan Klein,et al. Accurate Unlexicalized Parsing , 2003, ACL.

[14] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[15] Qingzhong Wang,et al. Group-based Distinctive Image Captioning with Memory Attention , 2021, ACM Multimedia.

[16] Andrew W. Senior,et al. Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition , 2014, ArXiv.

[17] Tao Mei,et al. Exploring Visual Relationship for Image Captioning , 2018, ECCV.

[18] Ahmed Elhagry,et al. A Thorough Review on Recent Deep Learning Methodologies for Image Captioning , 2021, ArXiv.

[19] Julien Perez,et al. Learning Visual Representations with Caption Annotations , 2020, ECCV.

[20] Yingru Liu,et al. ReFormer: The Relational Transformer for Image Captioning , 2021, ArXiv.

[21] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[22] Michael I. Jordan,et al. Kernel independent component analysis , 2003 .

[23] Yejin Choi,et al. TreeTalk: Composition and Compression of Trees for Image Descriptions , 2014, TACL.

[24] S. Shen-Orr,et al. Network motifs: simple building blocks of complex networks. , 2002, Science.

[25] Chengjiang Long,et al. Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning , 2021, ACM Multimedia.

[26] Nazli Ikizler-Cinbis,et al. Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures , 2016, J. Artif. Intell. Res..

[27] Kyomin Jung,et al. UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning , 2021, ACL.

[28] Shuang Bai,et al. A survey on automatic image caption generation , 2018, Neurocomputing.

[29] Zachary Chase Lipton. A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[30] Fei Sha,et al. Aligning Where to See and What to Tell: Image Captioning with Region-Based Attention and Scene-Specific Contexts , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31] Philipp Koehn,et al. Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[32] Antoni B. Chan,et al. Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets , 2020, ECCV.

[33] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[34] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[35] Quoc V. Le,et al. Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.

[36] Ning Ding,et al. Length-Controllable Image Captioning , 2020, ECCV.

[37] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[38] Wei Liu,et al. CPTR: Full Transformer Network for Image Captioning , 2021, ArXiv.

[39] Di Jin,et al. Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards , 2020, ECCV.

[40] Olga Russakovsky,et al. Towards Unique and Informative Captioning of Images , 2020, ECCV.

[41] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.

[42] Bo Ren,et al. Image captioning by incorporating affective concepts learned from both visual and textual components , 2019, Neurocomputing.

[43] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.

[44] Thien Huu Nguyen,et al. Structural and Functional Decomposition for Personality Image Captioning in a Communication Game , 2020, FINDINGS.

[45] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[46] Xuanjing Huang,et al. TCIC: Theme Concepts Learning Cross Language and Vision for Image Captioning , 2021, IJCAI.

[47] Berkan Demirel,et al. Detection and Captioning with Unseen Object Classes , 2021, ArXiv.

[48] Cyrus Rashtchian,et al. Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.

[49] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[50] Yejin Choi,et al. CLIPScore: A Reference-free Evaluation Metric for Image Captioning , 2021, EMNLP.

[51] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[52] John Shawe-Taylor,et al. Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[53] Kyomin Jung,et al. QACE: Asking Questions to Evaluate an Image Caption , 2021, EMNLP.

[54] Xiaowei Guo,et al. Distributed Attention for Grounded Image Captioning , 2021, ACM Multimedia.

[55] Eduard H. Hovy,et al. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[56] Yuchen Zhai,et al. Similar Scenes Arouse Similar Emotions: Parallel Data Augmentation for Stylized Image Captioning , 2021, ACM Multimedia.

[57] Hefeng Wu,et al. Fine-Grained Image Captioning With Global-Local Discriminative Objective , 2020, IEEE Transactions on Multimedia.

[58] Stefan Roth,et al. Diverse Image Captioning with Context-Object Split Latent Spaces , 2020, NeurIPS.

[59] Basura Fernando,et al. SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.

[60] Ted Dunning,et al. Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[61] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..