Human Attention in Image Captioning: Dataset and Analysis
暂无分享,去创建一个
Ali Borji | H. R. Tavakoli | Nicolas Pugeault | Sen He | Hamed R. Tavakoli | A. Borji | N. Pugeault | Sen He
[1] Jorma Laaksonen,et al. Paying Attention to Descriptions Generated by Image Captioning Models , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[2] Yifan Peng,et al. Studying Relationships between Human Gaze, Description, and Computer Vision , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[3] Rita Cucchiara,et al. SAM: Pushing the Limits of Saliency Prediction Models , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[4] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.
[5] Richard Socher,et al. Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.
[7] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.
[8] Jeff B. Pelz,et al. SNAG: Spoken Narratives and Gaze Dataset , 2018, ACL.
[9] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.
[10] Michael A. Arbib,et al. Action to Language via the Mirror Neuron System: Attention and the minimal subscene , 2006 .
[11] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[12] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[13] Emiel Krahmer,et al. DIDEC: The Dutch Image Description and Eye-tracking Corpus , 2018, COLING.
[14] S Ullman,et al. Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.
[15] Ali Borji,et al. State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[16] Qi Zhao,et al. Boosted Attention: Leveraging Human Attention for Image Captioning , 2018, ECCV.
[17] Ali Borji,et al. Saliency Prediction in the Deep Learning Era: An Empirical Investigation , 2018, ArXiv.
[18] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.
[19] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[20] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[22] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[23] M. Arbib. Action to language via the mirror neuron system , 2006 .
[24] Tat-Seng Chua,et al. SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Meinard Müller,et al. Information retrieval for music and motion , 2007 .
[26] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[27] Xiaofeng Wu,et al. Scanpath estimation based on foveated image saliency , 2017, Cognitive Processing.
[28] Frédo Durand,et al. What Do Different Evaluation Metrics Tell Us About Saliency Models? , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[29] Noel E. O'Connor,et al. SalGAN: Visual Saliency Prediction with Generative Adversarial Networks , 2017, ArXiv.
[30] C. Lawrence Zitnick,et al. Collecting Image Description Datasets using Crowdsourcing , 2014, ArXiv.
[31] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[32] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[33] Julie C. Sedivy,et al. Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning , 1995 .
[34] Leon A. Gatys,et al. Understanding Low- and High-Level Contributions to Fixation Prediction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[35] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Dhruv Batra,et al. Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions? , 2016, EMNLP.
[37] Ali Borji,et al. Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study , 2013, IEEE Transactions on Image Processing.
[38] Qi Zhao,et al. SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Ali Borji,et al. Understanding and Visualizing Deep Visual Saliency Models , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).