Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
暂无分享,去创建一个
Yoshua Bengio | Ruslan Salakhutdinov | Richard S. Zemel | Aaron C. Courville | Kyunghyun Cho | Kelvin Xu | Jimmy Ba | Ryan Kiros | Yoshua Bengio | Ke Xu | Jimmy Ba | Ryan Kiros | Kyunghyun Cho | R. Salakhutdinov | R. Zemel
[1] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[2] Ronald A. Rensink. The Dynamic Representation of Scenes , 2000 .
[3] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.
[4] M. Corbetta,et al. Control of goal-directed and stimulus-driven attention in the brain , 2002, Nature Reviews Neuroscience.
[5] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[6] Geoffrey E. Hinton,et al. Learning to combine foveal glimpses with a third-order Boltzmann machine , 2010, NIPS.
[7] Yiannis Aloimonos,et al. Corpus-Guided Sentence Generation of Natural Images , 2011, EMNLP.
[8] Yejin Choi,et al. Composing Simple Image Descriptions using Web-scale N-grams , 2011, CoNLL.
[9] Yejin Choi,et al. Collective Generation of Natural Image Descriptions , 2012, ACL.
[10] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.
[11] Karl Stratos,et al. Midge: Generating Image Descriptions From Computer Vision Detections , 2012, EACL.
[12] Misha Denil,et al. Learning Where to Attend with Deep Architectures for Image Tracking , 2011, Neural Computation.
[13] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.
[14] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[15] Rolf Dach,et al. Technical Report 2012 , 2013 .
[16] Frank Keller,et al. Image Description using Visual Dependency Representations , 2013, EMNLP.
[17] Yejin Choi,et al. Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.
[18] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[19] Phil Blunsom,et al. Recurrent Continuous Translation Models , 2013, EMNLP.
[20] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..
[21] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[22] Nitish Srivastava,et al. Learning Generative Models with Visual Attention , 2013, NIPS.
[23] Jasper Snoek,et al. Input Warping for Bayesian Optimization of Non-Stationary Functions , 2014, ICML.
[24] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.
[25] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.
[26] Pierre Baldi,et al. The dropout learning algorithm , 2014, Artif. Intell..
[27] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[28] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[29] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[30] Razvan Pascanu,et al. How to Construct Deep Recurrent Neural Networks , 2013, ICLR.
[31] Yejin Choi,et al. TreeTalk: Composition and Compression of Trees for Image Descriptions , 2014, TACL.
[32] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[33] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[34] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.
[35] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[36] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.
[37] Xinlei Chen,et al. Learning a Recurrent Visual Representation for Image Caption Generation , 2014, ArXiv.
[38] Ruslan Salakhutdinov,et al. Multimodal Neural Language Models , 2014, ICML.
[39] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Wei Xu,et al. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.
[41] Lisa Anne Hendricks,et al. Long-term recurrent convolutional networks for visual recognition and description , 2015, CVPR.
[42] Christopher Joseph Pal,et al. Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[43] Koray Kavukcuoglu,et al. Visual Attention , 2020, Computational Models for Cognitive Vision.
[44] Alex Graves,et al. DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.
[45] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[46] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[49] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[50] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.
[51] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[53] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).