暂无分享,去创建一个
Yezhou Yang | Shailaja Keyur Sampat | Chitta Baral | Akshay Kumar | Yezhou Yang | Chitta Baral | Akshay Kumar
[1] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[2] Yoshua Bengio,et al. FigureQA: An Annotated Figure Dataset for Visual Reasoning , 2017, ICLR.
[3] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[4] Yiannis Aloimonos,et al. Following Instructions by Imagining and Reaching Visual Goals , 2020, ArXiv.
[5] Ali Farhadi,et al. From Recognition to Cognition: Visual Commonsense Reasoning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Jonghyun Choi,et al. Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Chuang Gan,et al. Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding , 2018, NeurIPS.
[8] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.
[9] Guokun Lai,et al. RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.
[10] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[11] Li Fei-Fei,et al. Composing Text and Image for Image Retrieval - an Empirical Odyssey , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Terry Winograd,et al. Procedures As A Representation For Data In A Computer Program For Understanding Natural Language , 1971 .
[13] Khanh Nguyen,et al. Vision-Based Navigation With Language-Based Assistance via Imitation Learning With Indirect Intervention , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Guosheng Lin,et al. Graph Edit Distance Reward: Learning to Edit Scene Graph , 2020, ECCV.
[15] Li Fei-Fei,et al. Inferring and Executing Programs for Visual Reasoning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[16] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[17] Dan Klein,et al. Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Seonghyeon Nam,et al. Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language , 2018, NeurIPS.
[19] Christopher D. Manning,et al. GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Roozbeh Mottaghi,et al. ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Yike Guo,et al. Semantic Image Synthesis via Adversarial Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[23] Gholamreza Haffari,et al. Scene Graph Modification Based on Natural Language Commands , 2020, FINDINGS.
[24] Anton van den Hengel,et al. Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision , 2020, ECCV.
[25] José M. F. Moura,et al. CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog , 2019, NAACL.
[26] Xiao-Jing Wang,et al. A dataset and architecture for visual reasoning with a working memory , 2018, ECCV.
[27] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[28] Richard S. Zemel,et al. Exploring Models and Data for Image Question Answering , 2015, NIPS.
[29] Qi Wu,et al. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[30] Yoav Artzi,et al. TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Brian L. Price,et al. DVQA: Understanding Data Visualizations via Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[32] Chitta Baral,et al. Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning , 2020, EMNLP.
[33] Li Fei-Fei,et al. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Daniel Petrie,et al. Are You Smarter Than a Fifth Grader , 2010 .
[35] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[36] Cho-Jui Hsieh,et al. What Does BERT with Vision Look At? , 2020, ACL.
[37] Ali Farhadi,et al. IQA: Visual Question Answering in Interactive Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[38] Yejin Choi,et al. VisualCOMET: Reasoning About the Dynamic Context of a Still Image , 2020, ECCV.
[39] Peter Clark,et al. WIQA: A dataset for “What if...” reasoning over procedural text , 2019, EMNLP.
[40] Marc'Aurelio Ranzato,et al. Mixture Models for Diverse Machine Translation: Tricks of the Trade , 2019, ICML.
[41] Mario Fritz,et al. Answering Visual What-If Questions: From Actions to Predicted Scene Descriptions , 2018, ECCV Workshops.
[42] David Gaddy,et al. Pre-Learning Environment Representations for Data-Efficient Neural Instruction Following , 2019, ACL.