Revisiting image captioning via maximum discrepancy competition
暂无分享,去创建一个
Yuming Fang | Boyang Wan | Yang Liu | Minwei Zhu | Wenhui Jiang | Qin Li | Yuming Fang | Wenhui Jiang | Qin Li | Yang Liu | Boyang Wan | Minwei Zhu
[1] Junbo Wang,et al. Learning visual relationship and context-aware attention for image captioning , 2020, Pattern Recognit..
[2] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[3] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[4] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[5] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[6] Jie Chen,et al. Attention on Attention for Image Captioning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[7] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[8] Qin Jin,et al. Better Captioning With Sequence-Level Exploration , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Zhengfang Duanmu,et al. Group Maximum Differentiation Competition: Model Comparison with Few Samples , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[10] Xiongkuo Min,et al. DevsNet: Deep Video Saliency Network using Short-term and Long-term Cues , 2020, Pattern Recognit..
[11] Rita Cucchiara,et al. Meshed-Memory Transformer for Image Captioning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Gregory Shakhnarovich,et al. Discriminability Objective for Training Descriptive Captions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[13] Tao Mei,et al. X-Linear Attention Networks for Image Captioning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Xinlei Chen,et al. nocaps: novel object captioning at scale , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[15] Cyrus Rashtchian,et al. Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.
[16] Simao Herdade,et al. Image Captioning: Transforming Objects into Words , 2019, NeurIPS.
[17] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.
[18] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[19] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.
[20] Jianfei Cai,et al. Auto-Encoding Scene Graphs for Image Captioning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Eero P. Simoncelli,et al. Maximum differentiation (MAD) competition: a methodology for comparing computational models of perceptual quantities. , 2008, Journal of vision.
[22] Jing Liu,et al. Normalized and Geometry-Aware Self-Attention Network for Image Captioning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Yejin Choi,et al. Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.
[24] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[25] Gang Wang,et al. Stack-Captioning: Coarse-to-Fine Learning for Image Captioning , 2017, AAAI.
[26] Sheng Tang,et al. Image Caption with Global-Local Attention , 2017, AAAI.
[27] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Sutthiphong Srigrarom. Dynamics of isolated columnar vortex in tube with sinusoidal cross-section , 2005, J. Vis..
[29] Basura Fernando,et al. SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.
[30] In-So Kweon,et al. CBAM: Convolutional Block Attention Module , 2018, ECCV.
[31] Vaibhava Goel,et al. Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[33] Christof Koch,et al. A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .
[34] Yejin Choi,et al. Composing Simple Image Descriptions using Web-scale N-grams , 2011, CoNLL.
[35] Peng Wang,et al. Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[37] Frank Hutter,et al. Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..
[38] Nevrez Imamoglu,et al. Video saliency detection by gestalt theory , 2019, Pattern Recognit..
[39] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[40] Luis G. Vargas,et al. Inconsistency and rank preservation , 1984 .
[41] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[42] Dumitru Erhan,et al. Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[43] Tianlong Chen,et al. I Am Going MAD: Maximum Discrepancy Competition for Comparing Classifiers Adaptively , 2020, ICLR.
[44] Yejin Choi,et al. TreeTalk: Composition and Compression of Trees for Image Descriptions , 2014, TACL.
[45] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[46] Wei Xu,et al. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.
[47] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[49] Shiming Xiang,et al. Dense semantic embedding network for image captioning , 2019, Pattern Recognit..
[50] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[51] C. V. Jawahar,et al. Choosing Linguistics over Vision to Describe Images , 2012, AAAI.
[52] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[53] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[54] Mert Kilickaya,et al. Re-evaluating Automatic Metrics for Image Captioning , 2016, EACL.
[55] Nassir Navab,et al. Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks , 2018, MICCAI.