Deep image captioning: A review of methods, trends and future challenges
暂无分享,去创建一个
[1] Suhyun Cho,et al. Generalized Image Captioning for Multilingual Support , 2023, Applied Sciences.
[2] Taro Watanabe,et al. Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning , 2022, 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
[3] Jun Wang,et al. An Overview of the Stability Analysis of Recurrent Neural Networks With Multiple Equilibria , 2021, IEEE Transactions on Neural Networks and Learning Systems.
[4] Rita Cucchiara,et al. From Show to Tell: A Survey on Deep Learning-Based Image Captioning , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[5] A. Mian,et al. Language Model Agnostic Gray-Box Adversarial Attack on Image Captioning , 2023, IEEE Transactions on Information Forensics and Security.
[6] Jun Yu,et al. Joint Embedding of Deep Visual and Semantic Features for Medical Image Report Generation , 2023, IEEE Transactions on Multimedia.
[7] Joseph Keshet,et al. A Baseline for Detecting Out-of-Distribution Examples in Image Captioning , 2022, ACM Multimedia.
[8] P. Sudeep,et al. Image Captioning Encoder–Decoder Models Using CNN-RNN Architectures: A Comparative Study , 2022, Circuits, Systems, and Signal Processing.
[9] Solon Barocas,et al. Measuring Representational Harms in Image Captioning , 2022, FAccT.
[10] M. Ackerman,et al. “So What? What's That to Do With Me?” Expectations of People With Visual Impairments for Image Descriptions in Their Personal Photo Activities , 2022, Conference on Designing Interactive Systems.
[11] David Abou Chacra,et al. The Topology and Language of Relationships in the Visual Genome Dataset , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[12] Dan Guo,et al. Memorial GAN With Joint Semantic Optimization for Unpaired Image Captioning , 2022, IEEE Transactions on Cybernetics.
[13] David A. Ross,et al. What’s in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[14] Abdulganiyu Abdu Yusuf,et al. An analysis of graph convolutional networks and recent datasets for visual question answering , 2022, Artificial Intelligence Review.
[15] Noa García,et al. Quantifying Societal Bias Amplification in Image Captioning , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Ziyu Guan,et al. Special Issue on Decision Making in Heterogeneous Network Data Scenarios and Applications , 2022, World Wide Web.
[17] Hongmin Cai,et al. Learning Transferable Perturbations for Image Captioning , 2022, ACM Trans. Multim. Comput. Commun. Appl..
[18] Yuming Fang,et al. Revisiting image captioning via maximum discrepancy competition , 2022, Pattern Recognit..
[19] Huifang Ma,et al. Dual Global Enhanced Transformer for image captioning , 2022, Neural Networks.
[20] Zhihui Li,et al. A Comprehensive Survey of Neural Architecture Search , 2021, ACM Comput. Surv..
[21] Mingyuan Zhou,et al. Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph Captioning , 2021, International Journal of Computer Vision.
[22] Mengchu Zhou,et al. Dynamic Embedding Projection-Gated Convolutional Neural Networks for Text Classification , 2021, IEEE Transactions on Neural Networks and Learning Systems.
[23] Qiang Wu,et al. Dual Attention on Pyramid Feature Maps for Image Captioning , 2020, IEEE Transactions on Multimedia.
[24] Xiaodan Liang,et al. Unifying Relational Sentence Generation and Retrieval for Medical Image Report Composition , 2020, IEEE Transactions on Cybernetics.
[25] Weili Guan,et al. Chinese Image Caption Generation via Visual Attention and Topic Modeling , 2020, IEEE Transactions on Cybernetics.
[26] Yongdong Zhang,et al. Context-Aware Visual Policy Network for Fine-Grained Image Captioning , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[27] Meredith Ringel Morris,et al. Going Beyond One-Size-Fits-All Image Descriptions to Satisfy the Information Wants of People Who are Blind or Have Low Vision , 2021, ASSETS.
[28] Zhengping Che,et al. Hierarchical Graph Attention Network for Few-shot Visual-Semantic Learning , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[29] Achleshwar Luthra,et al. MedSkip: Medical Report Generation Using Skip Connections and Integrated Attention , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).
[30] Hongwei Mo,et al. Image Caption Generation Using Multi-Level Semantic Context Information , 2021, Symmetry.
[31] Olga Russakovsky,et al. Understanding and Evaluating Racial Biases in Image Captioning , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[32] Amjad Rehman,et al. Automatic medical image interpretation: State of the art and future directions , 2021, Pattern Recognit..
[33] Lijuan Wang,et al. VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning , 2021, AAAI.
[34] Klaus Diepold,et al. Multi-agent deep reinforcement learning: a survey , 2021, Artificial Intelligence Review.
[35] Yeganeh Madadi,et al. Adversarial Image Caption Generator Network , 2021, SN Computer Science.
[36] Wei Liu,et al. Human-like Controllable Image Captioning with Verb-specific Semantic Roles , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Danielle Albers Szafir,et al. Connecting Human-Robot Interaction and Data Visualization , 2021, 2021 16th ACM/IEEE International Conference on Human-Robot Interaction (HRI).
[38] Zhiqiang Hou,et al. Research on Image Caption Based on Multiple Word Embedding Representations , 2021, 2021 3rd International Conference on Natural Language Processing (ICNLP).
[39] Christopher J. Anders,et al. Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications , 2021, Proceedings of the IEEE.
[40] Tanaya Guha,et al. In Defense of Scene Graphs for Image Captioning , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[41] Yong Wang,et al. Automatic ultrasound image report generation with adaptive multimodal attention mechanism , 2021, Neurocomputing.
[42] Vicente Ordonez,et al. Visual News: Benchmark and Challenges in News Image Captioning , 2020, EMNLP.
[43] Aske Plaat,et al. A survey of deep meta-learning , 2020, Artificial Intelligence Review.
[44] Ruixiang Tang,et al. Mitigating Gender Bias in Captioning Systems , 2020, WWW.
[45] Yilong Yin,et al. Unifying Neural Learning and Symbolic Reasoning for Spinal Medical Report Generation , 2020, Medical Image Anal..
[46] Hanwang Zhang,et al. Deconfounded Image Captioning: A Causal Retrospect , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[47] Aidong Zhang,et al. A Survey on Causal Inference , 2020, ACM Trans. Knowl. Discov. Data.
[48] Karin M. Verspoor,et al. FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark , 2021, NeurIPS Datasets and Benchmarks.
[49] Zhen Guo,et al. ImageSem Group at ImageCLEFmed Caption 2021 Task: Exploring the Clinical Significance of the Textual Descriptions Derived from Medical Images , 2021, CLEF.
[50] Nan Duan,et al. Control Image Captioning Spatially and Temporally , 2021, ACL.
[51] Ruqiang Yan,et al. Domain Adversarial Graph Convolutional Network for Fault Diagnosis Under Variable Working Conditions , 2021, IEEE Transactions on Instrumentation and Measurement.
[52] Zhenglong Sun,et al. Intention Understanding in Human–Robot Interaction Based on Visual-NLP Semantics , 2021, Frontiers in Neurorobotics.
[53] Md. Kishor Morol,et al. Image to Bengali Caption Generation Using Deep CNN and Bidirectional Gated Recurrent Unit , 2020, 2020 23rd International Conference on Computer and Information Technology (ICCIT).
[54] John D. Kelleher,et al. Language-Driven Region Pointer Advancement for Controllable Image Captioning , 2020, COLING.
[55] Tsung-Hui Chang,et al. Generating Radiology Reports via Memory-driven Transformer , 2020, EMNLP.
[56] Chengming Li,et al. An Ensemble of Generation- and Retrieval-Based Image Captioning With Dual Generator Generative Adversarial Network , 2020, IEEE Transactions on Image Processing.
[57] Xiaoshuai Sun,et al. Attacking Image Captioning Towards Accuracy-Preserving Target Words Removal , 2020, ACM Multimedia.
[58] Zhengcong Fei,et al. Iterative Back Modification for Faster Image Captioning , 2020, ACM Multimedia.
[59] Jianwei Niu,et al. Automatic Medical Image Report Generation with Multi-view and Multi-modal Attention Mechanism , 2020, ICA3PP.
[60] Bing Liu,et al. Remote sensing image captioning via Variational Autoencoder and Reinforcement Learning , 2020, Knowl. Based Syst..
[61] Ruiqin Xiong,et al. Visual Relationship Embedding Network for Image Paragraph Generation , 2020, IEEE Transactions on Multimedia.
[62] Usha Ruby Dr.A,et al. Binary cross entropy with deep learning technique for Image classification , 2020 .
[63] Anup Pillai,et al. Chest X-ray Report Generation through Fine-Grained Label Learning , 2020, MICCAI.
[64] Xing Xu,et al. Fooled by Imagination: Adversarial Attack to Image Captioning Via Perturbation in Complex Domain , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).
[65] Yuling Xi,et al. Stimulus-driven and concept-driven analysis for image caption generation , 2020, Neurocomputing.
[66] Xu Zhou,et al. Improving Image Captioning with Better Use of Caption , 2020, ACL.
[67] Jingsong He,et al. Boosting image caption generation with feature fusion module , 2020, Multimedia Tools and Applications.
[68] Li Wen,et al. Deep learning for ultrasound image caption generation based on object detection , 2020, Neurocomputing.
[69] Zhe Gan,et al. Improving Adversarial Text Generation by Modeling the Distant Future , 2020, ACL.
[70] Jing Liu,et al. Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning , 2020, IJCAI.
[71] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[72] Hongyuan Zha,et al. Learning Long- and Short-Term User Literal-Preference with Multimodal Hierarchical Transformer Network for Personalized Image Caption , 2020, AAAI.
[73] Bodo Rosenhahn,et al. Image Captioning through Image Transformer , 2020, ACCV.
[74] Tao Mei,et al. X-Linear Attention Networks for Image Captioning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[75] Jing Liu,et al. Normalized and Geometry-Aware Self-Attention Network for Image Captioning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[76] Yiran Chen,et al. A Survey of Accelerator Architectures for Deep Neural Networks , 2020 .
[77] Weili Guan,et al. Image caption generation with dual attention mechanism , 2020, Inf. Process. Manag..
[78] Ajay Bansal,et al. Ensemble Learning on Deep Neural Networks for Image Caption Generation , 2020, 2020 IEEE 14th International Conference on Semantic Computing (ICSC).
[79] Junbo Wang,et al. Learning visual relationship and context-aware attention for image captioning , 2020, Pattern Recognit..
[80] Xinlei Chen,et al. In Defense of Grid Features for Visual Question Answering , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[81] Yue Zhang,et al. An Overview of Image Caption Generation Methods , 2020, Comput. Intell. Neurosci..
[82] Marcella Cornia,et al. Meshed-Memory Transformer for Image Captioning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[83] Heng Tao Shen,et al. Hierarchical LSTMs with Adaptive Attention for Visual Captioning , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[84] Xian Wu,et al. Prophet Attention: Predicting Attention with Future Attention , 2020, NeurIPS.
[85] Steven Horng,et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports , 2019, Scientific Data.
[86] Jaewoo Kang,et al. Graph Transformer Networks , 2019, NeurIPS.
[87] Xiaojun Wan,et al. Generating Diverse and Descriptive Image Captions Using Visual Paraphrases , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[88] Liang Sun,et al. Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[89] Tao Mei,et al. Hierarchy Parsing for Image Captioning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[90] I. Kweon,et al. Image Captioning with Very Scarce Supervised Data: Adversarial Semi-Supervised Learning Approach , 2019, EMNLP.
[91] Zhe Gan,et al. TIGEr: Text-to-Image Grounding for Image Caption Evaluation , 2019, EMNLP.
[92] Lin Li,et al. Squeeze-and-Excitation Wide Residual Networks in Image Classification , 2019, 2019 IEEE International Conference on Image Processing (ICIP).
[93] Fawaz Sammani,et al. Look and Modify: Modification Networks for Image Captioning , 2019, BMVC.
[94] Yu-Wing Tai,et al. Reflective Decoding Network for Image Captioning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[95] Hanqing Lu,et al. Aligning Linguistic Words and Visual Semantic Units for Image Captioning , 2019, ACM Multimedia.
[96] Fenglin Liu,et al. Exploring and Distilling Cross-Modal Information for Image Captioning , 2019, IJCAI.
[97] Jiebo Luo,et al. Automatic Radiology Report Generation based on Multi-view Image Fusion and Medical Concept Enrichment , 2019, MICCAI.
[98] Heng Tao Shen,et al. Deliberate Attention Networks for Image Captioning , 2019, AAAI.
[99] Lin Wu,et al. CORAL8: Concurrent Object Regression for Area Localization in Medical Image Panels , 2019, MICCAI.
[100] Simao Herdade,et al. Image Captioning: Transforming Objects into Words , 2019, NeurIPS.
[101] Hanqing Lu,et al. MSCap: Multi-Style Image Captioning With Unpaired Stylized Text , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[102] Yuan Yan Tang,et al. Maximum Likelihood Estimation-Based Joint Sparse Representation for the Classification of Hyperspectral Remote Sensing Images , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[103] Baoyuan Wu,et al. Exact Adversarial Attack to Image Captioning via Structured Output Learning With Latent Variables , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[104] Tao Mei,et al. Pointing Novel Objects in Image Captioning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[105] Gang Wang,et al. Unpaired Image Captioning via Scene Graph Alignments , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[106] Dong Liu,et al. Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[107] Kang Li,et al. Visual to Text: Survey of Image and Video Captioning , 2019, IEEE Transactions on Emerging Topics in Computational Intelligence.
[108] Shahram Latifi,et al. Audio Enhancement and Synthesis using Generative Adversarial Networks: A Survey , 2019, International Journal of Computer Applications.
[109] Sanja Fidler,et al. Learning to Caption Images Through a Lifetime by Asking Questions , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[110] Md. Zakir Hossain,et al. A Comprehensive Survey of Deep Learning for Image Captioning , 2018, ACM Comput. Surv..
[111] Paul Babyn,et al. Generative Adversarial Network in Medical Imaging: A Review , 2018, Medical Image Anal..
[112] Sungroh Yoon,et al. How Generative Adversarial Networks and Their Variants Work , 2017, ACM Comput. Surv..
[113] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[114] Sam Kwong,et al. Deep sequential fusion LSTM network for image description , 2018, Neurocomputing.
[115] Shuang Bai,et al. A survey on automatic image caption generation , 2018, Neurocomputing.
[116] Tao Xu,et al. Multimodal Recurrent Model with Attention for Automated Radiology Report Generation , 2018, MICCAI.
[117] Wei Liu,et al. Recurrent Fusion Network for Image Captioning , 2018, ECCV.
[118] Bo Dai,et al. Rethinking the Form of Latent States in Image Captioning , 2018, ECCV.
[119] Qingyang Xu,et al. A survey on deep neural network-based image captioning , 2018, The Visual Computer.
[120] Xuanjing Huang,et al. Toward Diverse Text Generation with Inverse Reinforcement Learning , 2018, IJCAI.
[121] Lei Zhang,et al. Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning , 2018, ArXiv.
[122] Trevor Darrell,et al. Women also Snowboard: Overcoming Bias in Captioning Models , 2018, ECCV.
[123] Yongdong Zhang,et al. GLA: Global–Local Attention for Image Description , 2018, IEEE Transactions on Multimedia.
[124] Li Zhang,et al. A region-based image caption generator with refined descriptions , 2018, Neurocomputing.
[125] Jinfeng Yi,et al. Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning , 2017, ACL.
[126] Xirong Li,et al. Predicting Visual Features From Text for Image and Video Caption Retrieval , 2017, IEEE Transactions on Multimedia.
[127] Gang Wang,et al. Stack-Captioning: Coarse-to-Fine Learning for Image Captioning , 2017, AAAI.
[128] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[129] Xu Sun,et al. Diversity-Promoting GAN: A Cross-Entropy Based Generative Adversarial Network for Diversified Text Generation , 2018, EMNLP.
[130] Bo Zhao,et al. AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding , 2017, ArXiv.
[131] Rita Cucchiara,et al. Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention , 2017 .
[132] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[133] Kevin Lin,et al. Adversarial Ranking for Language Generation , 2017, NIPS.
[134] Min Sun,et al. Show, Adapt and Tell: Adversarial Training of Cross-Domain Image Captioner , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[135] Garrison W. Cottrell,et al. Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[136] Ping Tan,et al. DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[137] Bernt Schiele,et al. Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[138] Gang Wang,et al. An Empirical Study of Language CNN for Image Captioning , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[139] Richard Socher,et al. Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[140] Cordelia Schmid,et al. Areas of Attention for Image Captioning , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[141] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2016, International Journal of Computer Vision.
[142] Siqi Liu,et al. Improved Image Captioning via Policy Gradient optimization of SPIDEr , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[143] Zhe Gan,et al. Semantic Compositional Networks for Visual Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[144] Tao Mei,et al. Boosting Image Captioning with Attributes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[145] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[146] Xirong Li,et al. Adding Chinese Captions to Images , 2016, ICMR.
[147] Jiebo Luo,et al. Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[148] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[149] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[150] Clement J. McDonald,et al. Preparing a collection of radiology examinations for distribution and retrieval , 2015, J. Am. Medical Informatics Assoc..
[151] David A. Shamma,et al. YFCC100M , 2015, Commun. ACM.
[152] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[153] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[154] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[155] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[156] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..
[157] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.