暂无分享,去创建一个
Andreas Bulling | Prajit Dhar | Florian Strohm | Ekta Sood | Fabian Kogel | A. Bulling | Florian Strohm | Prajit Dhar | Ekta Sood | Fabian Kögel
[1] Yusuke Sugano,et al. Seeing with Humans: Gaze-Assisted Neural Image Captioning , 2016, ArXiv.
[2] Alana de Santana Correia,et al. Attention, please! A survey of neural attention models in deep learning , 2021, Artificial Intelligence Review.
[3] Allan Jabri,et al. Revisiting Visual Question Answering Baselines , 2016, ECCV.
[4] Byoung-Tak Zhang,et al. Multimodal Residual Learning for Visual QA , 2016, NIPS.
[5] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[6] Viv Bewick,et al. Statistics review 7: Correlation and regression , 2003, Critical care.
[7] Takayuki Okatani,et al. Improved Fusion of Visual and Language Representations by Dense Symmetric Co-attention for Visual Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[8] C. Constantinidis,et al. Bottom-Up and Top-Down Attention , 2014, The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry.
[9] Jorma Laaksonen,et al. Paying Attention to Descriptions Generated by Image Captioning Models , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[10] Zhou Yu,et al. Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[11] Chuang Gan,et al. Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering , 2019, AAAI.
[12] Mario Fritz,et al. Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[13] Ling Shao,et al. Understanding More About Human and Machine Attention in Deep Neural Networks , 2021, IEEE Transactions on Multimedia.
[14] Jianfeng Dong,et al. Exploring Human-like Attention Supervision in Visual Question Answering , 2017, AAAI.
[15] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[16] Yifan Peng,et al. Studying Relationships between Human Gaze, Description, and Computer Vision , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[17] Chao Yang,et al. Co-Attention Network With Question Type for Visual Question Answering , 2019, IEEE Access.
[18] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Byoung-Tak Zhang,et al. Bilinear Attention Networks , 2018, NeurIPS.
[20] Xinlei Chen,et al. Pythia v0.1: the Winning Entry to the VQA Challenge 2018 , 2018, ArXiv.
[21] Trevor Darrell,et al. Multimodal Explanations: Justifying Decisions and Pointing to the Evidence , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[22] Andreas Bulling,et al. Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention , 2020, NeurIPS.
[23] Byron C. Wallace,et al. Attention is not Explanation , 2019, NAACL.
[24] Dhruv Batra,et al. Analyzing the Behavior of Visual Question Answering Models , 2016, EMNLP.
[25] Jonathon S. Hare,et al. Learning to Count Objects in Natural Images for Visual Question Answering , 2018, ICLR.
[26] Shi Chen,et al. AiR: Attention with Reasoning Capability , 2020, ECCV.
[27] Mario Fritz,et al. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.
[28] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[29] Xinlei Chen,et al. In Defense of Grid Features for Visual Question Answering , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Benjamin W Tatler,et al. The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.
[31] Xinlei Chen,et al. Cycle-Consistency for Robust Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[33] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[34] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[35] Christopher Kanan,et al. An Analysis of Visual Question Answering Algorithms , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[36] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[37] Chen Sun,et al. VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[38] Dhruv Batra,et al. Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[39] Ngoc Thang Vu,et al. Interpreting Attention Models with Human Visual Attention in Machine Reading Comprehension , 2020, CONLL.
[40] Dhruv Batra,et al. Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions? , 2016, EMNLP.
[41] Eric Horvitz,et al. SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Niels da Vitoria Lobo,et al. MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering , 2020, FINDINGS.
[43] Qi Zhao,et al. SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Zhou Yu,et al. Deep Modular Co-Attention Networks for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Jiasen Lu,et al. Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.
[46] Diyi Yang,et al. Hierarchical Attention Networks for Document Classification , 2016, NAACL.
[47] Ali Borji,et al. Human Attention in Image Captioning: Dataset and Analysis , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[48] Aude Oliva,et al. How Much Time Do You Have? Modeling Multi-Duration Saliency , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Jorma Laaksonen,et al. Saliency Revisited: Analysis of Mouse Movements Versus Fixations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Ignace T C Hooge,et al. Gaze tracking accuracy in humans: One eye is sometimes better than two , 2018, Behavior Research Methods.
[52] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[53] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Christopher Kanan,et al. Challenges and Prospects in Vision and Language Research , 2019, Front. Artif. Intell..
[55] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[56] Yash Goyal,et al. Yin and Yang: Balancing and Answering Binary Visual Questions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.