Stimulus-driven and concept-driven analysis for image caption generation
暂无分享,去创建一个
Yuling Xi | Songtao Ding | Shiru Qu | Shaohua Wan | Songtao Ding | Shiru Qu | Shaohua Wan | Yuling Xi
[1] Christof Koch,et al. A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .
[2] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[3] Yejin Choi,et al. Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.
[4] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[5] Paolo Bartolomeo,et al. The Attention Systems of the Human Brain , 2014 .
[6] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[7] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[9] Wei Xu,et al. Explain Images with Multimodal Recurrent Neural Networks , 2014, ArXiv.
[10] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[11] Michael S. Bernstein,et al. Visual7W: Grounded Question Answering in Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] G. Mangun,et al. The neural mechanisms of top-down attentional control , 2000, Nature Neuroscience.
[13] Christopher Joseph Pal,et al. Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[14] Liang Lin,et al. I2T: Image Parsing to Text Description , 2010, Proceedings of the IEEE.
[15] Jun Yu,et al. Multitask Autoencoder Model for Recovering Human Poses , 2018, IEEE Transactions on Industrial Electronics.
[16] Jun Yu,et al. Click Prediction for Web Image Reranking Using Multimodal Sparse Coding , 2014, IEEE Transactions on Image Processing.
[17] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[18] Lei Cai,et al. Two-archive method for aggregation-based many-objective optimization , 2018, Inf. Sci..
[19] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[20] Yupu Yang,et al. Modeling coverage with semantic embedding for image caption generation , 2018, The Visual Computer.
[21] S. Nieuwenhuis,et al. Neural mechanisms of attention and control: losing our inhibitions? , 2005, Nature Neuroscience.
[22] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] J. Movshon,et al. Nature and interaction of signals from the receptive field center and surround in macaque V1 neurons. , 2002, Journal of neurophysiology.
[24] D. Heeger,et al. When size matters: attention affects performance by contrast or response gain , 2010, Nature Neuroscience.
[25] Ye Yuan,et al. Review Networks for Caption Generation , 2016, NIPS.
[26] Cyrus Rashtchian,et al. Collecting Image Annotations Using Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.
[27] Shaohua Wan,et al. A long video caption generation algorithm for big video data retrieval , 2019, Future Gener. Comput. Syst..
[28] P. Kay,et al. Basic Color Terms: Their Universality and Evolution , 1973 .
[29] Jiebo Luo,et al. Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Jianping Fan,et al. iPrivacy: Image Privacy Protection by Identifying Sensitive Objects via Deep Multi-Task Learning , 2017, IEEE Transactions on Information Forensics and Security.
[32] Geoffrey E. Hinton,et al. Learning to combine foveal glimpses with a third-order Boltzmann machine , 2010, NIPS.
[33] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[34] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[35] Yongdong Zhang,et al. GLA: Global–Local Attention for Image Description , 2018, IEEE Transactions on Multimedia.
[36] Ronald A. Rensink. The Dynamic Representation of Scenes , 2000 .
[37] Nitish Srivastava,et al. Learning Generative Models with Visual Attention , 2013, NIPS.
[38] Tao Mei,et al. Boosting Image Captioning with Attributes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[39] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[40] Hans-Hellmut Nagel,et al. Knowledge representation for the generation of quantified natural language descriptions of vehicle traffic in image sequences , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.
[41] M. Corbetta,et al. Control of goal-directed and stimulus-driven attention in the brain , 2002, Nature Reviews Neuroscience.
[42] Cyrus Rashtchian,et al. Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.
[43] Cordelia Schmid,et al. Learning Color Names for Real-World Applications , 2009, IEEE Transactions on Image Processing.
[44] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.
[45] Jianping Fan,et al. Leveraging Content Sensitiveness and User Trustworthiness to Recommend Fine-Grained Privacy Settings for Social Image Sharing , 2018, IEEE Transactions on Information Forensics and Security.
[46] Arun Kumar Sangaiah,et al. Image caption generation with high-level image features , 2019, Pattern Recognit. Lett..