Prophet Attention: Predicting Attention with Future Attention for Improved Image Captioning
暂无分享,去创建一个
Xu Sun | Xuancheng Ren | Fenglin Liu | Wei Fan | Yuexian Zou | Xian Wu | Shen Ge | Fenglin Liu
[1] Tao Mei,et al. Exploring Visual Relationship for Image Captioning , 2018, ECCV.
[2] Steven Bird,et al. NLTK: The Natural Language Toolkit , 2002, ACL.
[3] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[4] Oladimeji Farri,et al. Neural Paraphrase Generation with Stacked Residual LSTM Networks , 2016, COLING.
[5] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[6] Yixin Chen,et al. SHOW , 2018, Silent Cinema.
[7] Xinlei Chen,et al. Grounded Video Description , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Jongwook Choi,et al. Supervising Neural Attention Models for Video Captioning by Human Gaze Data , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Basura Fernando,et al. SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.
[10] Garrison W. Cottrell,et al. Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[12] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Jianwei Yang,et al. Neural Baby Talk , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[14] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[15] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[16] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[17] Weiping Wang,et al. Generating Paraphrase with Topic as Prior Knowledge , 2019, CIKM.
[18] Tao Mei,et al. X-Linear Attention Networks for Image Captioning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Jie Chen,et al. Attention on Attention for Image Captioning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[20] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[21] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Tao Mei,et al. Boosting Image Captioning with Attributes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[23] Dhruv Batra,et al. Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions? , 2016, EMNLP.
[24] Ankush Gupta,et al. A Deep Generative Framework for Paraphrase Generation , 2017, AAAI.
[25] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..
[26] Tao Mei,et al. Jointly Modeling Embedding and Translation to Bridge Video and Language , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[28] Vaibhava Goel,et al. Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Nitin Madnani,et al. Generating Phrasal and Sentential Paraphrases: A Survey of Data-Driven Methods , 2010, CL.
[30] Zhe Gan,et al. Distilling Knowledge Learned in BERT for Text Generation , 2019, ACL.
[31] Yu-Gang Jiang,et al. Motion Guided Spatial Attention for Video Captioning , 2019, AAAI.
[32] Zhe Gan,et al. Semantic Compositional Networks for Visual Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Liang Wang,et al. Referring Expression Generation and Comprehension via Attributes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[34] Tat-Seng Chua,et al. SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Trevor Darrell,et al. Grounding Visual Explanations , 2018, ECCV.
[36] Xu Sun,et al. simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions , 2018, EMNLP.
[37] Rita Cucchiara,et al. Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Tianhao Zhang,et al. Exploring Semantic Relationships for Image Captioning without Parallel Data , 2019, 2019 IEEE International Conference on Data Mining (ICDM).
[39] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[40] Richard Socher,et al. Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and VQA , 2017, ArXiv.
[42] Alexander M. Rush,et al. OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.
[43] Hongxia Jin,et al. Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[44] Chenxi Liu,et al. Attention Correctness in Neural Image Captioning , 2016, AAAI.
[45] Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations , 2019, NeurIPS.
[46] Fenglin Liu,et al. Exploring and Distilling Cross-Modal Information for Image Captioning , 2019, IJCAI.
[47] Simao Herdade,et al. Image Captioning: Transforming Objects into Words , 2019, NeurIPS.
[48] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Conghui Zhu,et al. Modeling Future Cost for Neural Machine Translation , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[50] Jiebo Luo,et al. Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Eduard H. Hovy,et al. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.
[52] Ning Zhang,et al. Deep Reinforcement Learning-Based Image Captioning with Embedding Reward , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Xian Wu,et al. Federated Learning for Vision-and-Language Grounding Problems , 2020, AAAI.
[54] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Marcella Cornia,et al. Meshed-Memory Transformer for Image Captioning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[56] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[57] Rebecca J. Passonneau,et al. Wise Crowd Content Assessment and Educational Rubrics , 2016, International Journal of Artificial Intelligence in Education.
[58] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[59] Hongtao Lu,et al. Look Back and Predict Forward in Image Captioning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[60] Peter J. Liu,et al. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2019, ICML.
[61] Ghassan AlRegib,et al. Learning to Generate Grounded Visual Captions Without Localization Supervision , 2019, ECCV.
[62] Jianfei Cai,et al. Auto-Encoding Scene Graphs for Image Captioning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[63] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[64] Wenmin Wang,et al. Adaptively Aligned Image Captioning via Adaptive Attention Time , 2019, NeurIPS.