Image caption generation with high-level image features
暂无分享,去创建一个
Arun Kumar Sangaiah | Shaohua Wan | Yuling Xi | Songtao Ding | Shiru Qu | A. K. Sangaiah | Songtao Ding | Shiru Qu | Shaohua Wan | Yuling Xi
[1] Cyrus Rashtchian,et al. Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.
[2] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[3] Paolo Bartolomeo,et al. The Attention Systems of the Human Brain , 2014 .
[4] Nitish Srivastava,et al. Learning Generative Models with Visual Attention , 2013, NIPS.
[5] Cyrus Rashtchian,et al. Collecting Image Annotations Using Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.
[6] Ye Yuan,et al. Review Networks for Caption Generation , 2016, NIPS.
[7] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.
[8] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[9] Yejin Choi,et al. TreeTalk: Composition and Compression of Trees for Image Descriptions , 2014, TACL.
[10] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[11] Raymond W. Ptucha,et al. Automatic image assessment from facial attributes , 2013, Electronic Imaging.
[12] Jiebo Luo,et al. Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[14] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.
[16] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[17] Ronald A. Rensink. The Dynamic Representation of Scenes , 2000 .
[18] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.
[19] Ruslan Salakhutdinov,et al. Multimodal Neural Language Models , 2014, ICML.
[20] P. Perona,et al. What do we perceive in a glance of a real-world scene? , 2007, Journal of vision.
[21] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[22] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[24] Yejin Choi,et al. Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.
[25] Misha Denil,et al. Learning Where to Attend with Deep Architectures for Image Tracking , 2011, Neural Computation.
[26] Luc Van Gool,et al. Creating Summaries from User Videos , 2014, ECCV.
[27] Tao Mei,et al. Boosting Image Captioning with Attributes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[28] DarrellTrevor,et al. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description , 2017 .
[29] Wei Xu,et al. Explain Images with Multimodal Recurrent Neural Networks , 2014, ArXiv.