暂无分享,去创建一个
[1] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.
[3] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[4] Li Fei-Fei,et al. DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.
[6] Richard Socher,et al. Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Jiebo Luo,et al. Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Cyrus Rashtchian,et al. Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.
[9] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[10] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Matt Post,et al. We start by defining the recurrent architecture as implemented in S OCKEYE , following , 2018 .
[12] Anton van den Hengel,et al. Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[13] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[14] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[15] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[17] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[18] Ruslan Salakhutdinov,et al. Multimodal Neural Language Models , 2014, ICML.
[19] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[20] Pratik Rane,et al. Self-Critical Sequence Training for Image Captioning , 2018 .
[21] Yejin Choi,et al. Collective Generation of Natural Image Descriptions , 2012, ACL.
[22] Tao Mei,et al. Boosting Image Captioning with Attributes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[23] Xinlei Chen,et al. Mind's eye: A recurrent visual representation for image caption generation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[26] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[27] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[28] Alexander G. Schwing,et al. Convolutional Image Captioning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[29] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[30] Karl Stratos,et al. Midge: Generating Image Descriptions From Computer Vision Detections , 2012, EACL.
[31] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.