Multi-Level Policy and Reward-Based Deep Reinforcement Learning Framework for Image Captioning
暂无分享,去创建一个
Yongdong Zhang | Ning Xu | Yuting Su | Hanwang Zhang | Anan Liu | Weizhi Nie | Jie Nie | Hanwang Zhang | N. Xu | Anan Liu | Yuting Su | Weizhi Nie | Yongdong Zhang | Jie Nie
[1] Fuchun Sun,et al. MAT: A Multimodal Attentive Translator for Image Captioning , 2017, IJCAI.
[2] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[3] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.
[4] Arnold W. M. Smeulders,et al. Words Matter: Scene Text for Image Classification and Retrieval , 2017, IEEE Transactions on Multimedia.
[5] Karl Stratos,et al. Midge: Generating Image Descriptions From Computer Vision Detections , 2012, EACL.
[6] Meng Wang,et al. VideoWhisper: Toward Discriminative Unsupervised Video Feature Learning With Attention-Based Recurrent Neural Networks , 2017, IEEE Transactions on Multimedia.
[7] Heng Tao Shen,et al. Video Captioning With Attention-Based LSTM and Semantic Consistency , 2017, IEEE Transactions on Multimedia.
[8] Richard Socher,et al. Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Jin Young Choi,et al. Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Ye Yuan,et al. Encode, Review, and Decode: Reviewer Module for Caption Generation , 2016, ArXiv.
[11] Wei Liu,et al. Regularizing RNNs for Caption Generation by Reconstructing the Past with the Present , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[12] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.
[13] Gang Hua,et al. Collaborative Deep Reinforcement Learning for Joint Object Search , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Wei Xu,et al. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.
[15] Tat-Seng Chua,et al. SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[17] Xu Jia,et al. Guiding the Long-Short Term Memory Model for Image Caption Generation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[18] Cyrus Rashtchian,et al. Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.
[19] Ali Farhadi,et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[20] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Yejin Choi,et al. Collective Generation of Natural Image Descriptions , 2012, ACL.
[22] Tao Mei,et al. Boosting Image Captioning with Attributes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[23] José M. F. Moura,et al. Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[26] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[27] Armand Joulin,et al. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.
[28] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[29] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Yejin Choi,et al. TreeTalk: Composition and Compression of Trees for Image Descriptions , 2014, TACL.
[31] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[32] Frank Keller,et al. Image Description using Visual Dependency Representations , 2013, EMNLP.
[33] Shaomei Wu,et al. Automatic Alt-text: Computer-generated Image Descriptions for Blind Users on a Social Network Service , 2017, CSCW.
[34] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[35] B. S. Manjunath,et al. Video Annotation Through Search and Graph Reinforcement Mining , 2010, IEEE Transactions on Multimedia.
[36] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[37] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[38] Alan L. Yuille,et al. Joint Image-Text Representation by Gaussian Visual-Semantic Embedding , 2016, ACM Multimedia.
[39] Yongdong Zhang,et al. Multi-Level Policy and Reward Reinforcement Learning for Image Captioning , 2018, IJCAI.
[40] Siqi Liu,et al. Improved Image Captioning via Policy Gradient optimization of SPIDEr , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[41] Svetlana Lazebnik,et al. Active Object Localization with Deep Reinforcement Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[42] Yejin Choi,et al. Composing Simple Image Descriptions using Web-scale N-grams , 2011, CoNLL.
[43] M. Corbetta,et al. Control of goal-directed and stimulus-driven attention in the brain , 2002, Nature Reviews Neuroscience.
[44] Eduard H. Hovy,et al. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.
[45] Tao Mei,et al. Jointly Modeling Embedding and Translation to Bridge Video and Language , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Vaibhava Goel,et al. Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Ahmet Aker,et al. Generating Image Descriptions Using Dependency Relational Patterns , 2010, ACL.
[48] Henning Müller,et al. Retrieval From and Understanding of Large-Scale Multi-modal Medical Datasets: A Review , 2017, IEEE Transactions on Multimedia.
[49] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.
[50] Eric Brachmann,et al. PoseAgent: Budget-Constrained 6D Object Pose Estimation via Reinforcement Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Chunhua Shen,et al. What Value Do Explicit High Level Concepts Have in Vision to Language Problems? , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Rongrong Ji,et al. GroupCap: Group-Based Image Captioning with Structured Relevance and Diversity Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[53] Yejin Choi,et al. Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.
[54] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[55] Ning Zhang,et al. Deep Reinforcement Learning-Based Image Captioning with Embedding Reward , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[56] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[57] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[58] Jiebo Luo,et al. Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).