Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions?
暂无分享,去创建一个
[1] Christof Koch. Christof Koch , 2018, Current Biology.
[2] Stefan Lee,et al. Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[3] José M. F. Moura,et al. Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[5] Andrei Popescu-Belis,et al. Human versus Machine Attention in Document Classification: A Dataset with Crowdsourced Annotations , 2016, SocialNLP@EMNLP.
[6] Othman Omran Khalifa,et al. Multiple object recognition , 2016 .
[7] Zhiguo Wang,et al. Supervised Attentions for Neural Machine Translation , 2016, EMNLP.
[8] Jiasen Lu,et al. Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.
[9] Jiasen Lu,et al. Hierarchical Co-Attention for Visual Question Answering , 2016 .
[10] Jonathan Krause,et al. Leveraging the Wisdom of the Crowd for Fine-Grained Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[11] Richard Socher,et al. Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.
[12] Dan Klein,et al. Learning to Compose Neural Networks for Question Answering , 2016, NAACL.
[13] Yoshua Bengio,et al. Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.
[14] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Kate Saenko,et al. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering , 2015, ECCV.
[17] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Yoshua Bengio,et al. Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks , 2015, IEEE Transactions on Multimedia.
[19] Qi Zhao,et al. SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Saurabh Gupta,et al. Exploring Nearest Neighbor Approaches for Image Captioning , 2015, ArXiv.
[21] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[22] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[23] Koray Kavukcuoglu,et al. Multiple Object Recognition with Visual Attention , 2014, ICLR.
[24] Pierre Sermanet,et al. Attention for Fine-Grained Categorization , 2014, ICLR.
[25] Thomas Brox,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.
[26] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[27] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[28] Mario Fritz,et al. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.
[29] Qi Zhao,et al. Saliency in Crowd , 2014, ECCV.
[30] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.
[31] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[32] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.
[33] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.
[34] Jonathan Krause,et al. Fine-Grained Crowdsourcing for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[35] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[36] Frédo Durand,et al. Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[37] Benjamin W Tatler,et al. The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.
[38] P. Perona,et al. What do we perceive in a glance of a real-world scene? , 2007, Journal of vision.
[39] Laura A. Dabbish,et al. Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.
[40] M. Hayhoe,et al. In what ways do eye movements contribute to everyday activities? , 2001, Vision Research.
[41] Ronald A. Rensink. The Dynamic Representation of Scenes , 2000 .
[42] P.J. Denning,et al. On learning how to predict , 1980, Proceedings of the IEEE.
[43] A. L. I︠A︡rbus. Eye Movements and Vision , 1967 .
[44] A. L. Yarbus,et al. Eye Movements and Vision , 1967, Springer US.