暂无分享,去创建一个
Qika Lin | Jun Liu | Fangzhi Xu | Lingling Zhang | Tianzhe Zhao | Qi Chai | Yudai Pan | Qika Lin | J. Liu | Fangzhi Xu | Yudai Pan | Lingling Zhang | Qianyi Chai | Tianzhe Zhao
[1] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[2] Qinghua Zheng,et al. XTQA: Span-Level Explanations of the Textbook Question Answering , 2020, ArXiv.
[3] Doug Downey,et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.
[4] Jun Zhu,et al. Textbook Question Answering Under Instructor Guidance with Memory Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[5] Zhou Yu,et al. Deep Modular Co-Attention Networks for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Shi Chen,et al. AiR: Attention with Reasoning Capability , 2020, ECCV.
[7] Byoung-Tak Zhang,et al. Bilinear Attention Networks , 2018, NeurIPS.
[8] José Manuél Gómez-Pérez,et al. Look, Read and Enrich - Learning from Scientific Figures and their Captions , 2019, K-CAP.
[9] Byoung-Tak Zhang,et al. Multimodal Residual Learning for Visual QA , 2016, NIPS.
[10] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[11] Mahmoud Khademi,et al. Multimodal Neural Graph Memory Networks for Visual Question Answering , 2020, ACL.
[12] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[13] Ali Farhadi,et al. A Diagram is Worth a Dozen Images , 2016, ECCV.
[14] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[15] Weifeng Zhang,et al. Cross-modal Knowledge Reasoning for Knowledge-based Visual Question Answering , 2020, Pattern Recognit..
[16] Tongliang Liu,et al. Relation-Aware Fine-Grained Reasoning Network for Textbook Question Answering , 2021, IEEE Transactions on Neural Networks and Learning Systems.
[17] Liang Lin,et al. Interpretable Visual Question Answering by Reasoning on Dependency Trees , 2019, IEEE transactions on pattern analysis and machine intelligence.
[18] Nojun Kwak,et al. Textbook Question Answering with Multi-modal Context Graph Understanding and Self-supervised Open-set Comprehension , 2018, ACL.
[19] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[20] Jose Manuel Gomez-Perez,et al. ISAAQ - Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention , 2020, EMNLP.
[21] Jonghyun Choi,et al. Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[23] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[24] Matthieu Cord,et al. MUTAN: Multimodal Tucker Fusion for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[25] Soumen Chakrabarti,et al. Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering , 2021, SIGIR.