WeaQA: Weak Supervision via Captions for Visual Question Answering
暂无分享,去创建一个
Tejas Gokhale | Yezhou Yang | Pratyay Banerjee | Chitta Baral | Yezhou Yang | Pratyay Banerjee | Chitta Baral | Tejas Gokhale
[1] Omer Levy,et al. Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.
[2] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[3] Li Fei-Fei,et al. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[5] Christopher Kanan,et al. An Analysis of Visual Question Answering Algorithms , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[6] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[7] Ludovic Denoyer,et al. Unsupervised Question Answering by Cloze Translation , 2019, ACL.
[8] Zhou Yu,et al. Deep Modular Co-Attention Networks for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Gaurav Sharma,et al. An Empirical Evaluation of Visual Question Answering for Novel Objects , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).
[11] Luke Zettlemoyer,et al. Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases , 2019, EMNLP.
[12] Christopher D. Manning,et al. GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[14] Vaibhava Goel,et al. Unsupervised Adaptation of Question Answering Systems via Generative Self-training , 2020, EMNLP.
[15] Anton van den Hengel,et al. Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision , 2020, ECCV.
[16] Yonatan Belinkov,et al. Adversarial Regularization for Visual Question Answering: Strengths, Shortcomings, and Side Effects , 2019, Proceedings of the Second Workshop on Shortcomings in Vision and Language.
[17] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[18] Chitta Baral,et al. Self-Supervised Knowledge Triplet Learning for Zero-shot Question Answering , 2020, EMNLP.
[19] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[20] Christopher D. Manning,et al. Compositional Attention Networks for Machine Reasoning , 2018, ICLR.
[21] Zaïd Harchaoui,et al. On learning to localize objects with minimal supervision , 2014, ICML.
[22] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[23] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[24] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Rico Sennrich,et al. Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.
[26] Richard S. Zemel,et al. Exploring Models and Data for Image Question Answering , 2015, NIPS.
[27] Deva Ramanan,et al. Meta-Learning to Detect Rare Objects , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[28] Luke S. Zettlemoyer,et al. Question-Answer Driven Semantic Role Labeling: Using Natural Language to Annotate Natural Language , 2015, EMNLP.
[29] Trevor Darrell,et al. Localizing Moments in Video with Natural Language , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[30] Chitta Baral,et al. MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering , 2020, EMNLP.
[31] Shih-Fu Chang,et al. PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[32] Anton van den Hengel,et al. Unshuffling Data for Improved Generalization in Visual Question Answering , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[33] Anton van den Hengel,et al. Actively Seeking and Learning From Live Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[34] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[35] Jörg Tiedemann,et al. Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.
[36] Neil D. Lawrence,et al. When Training and Test Sets Are Different: Characterizing Learning Transfer , 2009 .
[37] Hung-Yu Kao,et al. Probing Neural Network Comprehension of Natural Language Arguments , 2019, ACL.
[38] Ramesh Nallapati,et al. Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering , 2020, ACL.
[39] Byoung-Tak Zhang,et al. Bilinear Attention Networks , 2018, NeurIPS.
[40] Chenxi Liu,et al. CLEVR-Ref+: Diagnosing Visual Reasoning With Referring Expressions , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Chitta Baral,et al. Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning , 2020, EMNLP.
[42] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[43] Shiliang Pu,et al. Counterfactual Samples Synthesizing for Robust Visual Question Answering , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[45] Sameer Singh,et al. Are Red Roses Red? Evaluating Consistency of Question-Answering Models , 2019, ACL.
[46] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[47] Dhruv Batra,et al. Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[48] Luke S. Zettlemoyer,et al. Large-Scale QA-SRL Parsing , 2018, ACL.
[49] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[50] Anton van den Hengel,et al. On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law , 2020, NeurIPS.
[51] Eduard Hovy,et al. Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2020, ICLR.
[52] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Raymond J. Mooney,et al. Self-Critical Reasoning for Robust Visual Question Answering , 2019, NeurIPS.
[54] Radu Soricut,et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.
[55] Amit K. Roy-Chowdhury,et al. Weakly Supervised Video Moment Retrieval From Text Queries , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[56] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[57] Thomas Wolf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[58] Shu Kong,et al. Weak Supervision and Referring Attention for Temporal-Textual Association Learning , 2020, ArXiv.
[59] Xinya Du,et al. Learning to Ask: Neural Question Generation for Reading Comprehension , 2017, ACL.
[60] Eric Horvitz,et al. SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[61] Christopher Kanan,et al. Answer Them All! Toward Universal Visual Question Answering Models , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[62] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[63] Bernt Schiele,et al. Simple Does It: Weakly Supervised Instance and Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[64] Michael S. Bernstein,et al. Information Maximizing Visual Question Generation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[65] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[66] Yash Goyal,et al. Yin and Yang: Balancing and Answering Binary Visual Questions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[67] Chitta Baral,et al. VQA-LOL: Visual Question Answering under the Lens of Logic , 2020, ECCV.
[68] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[69] Jiebo Luo,et al. Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation , 2019, FINDINGS.
[70] Dhruv Batra,et al. C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset , 2017, ArXiv.
[71] Qing Li,et al. Why Does a Visual Question Have Different Answers? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[72] Trevor Darrell,et al. Women also Snowboard: Overcoming Bias in Captioning Models , 2018, ECCV.
[73] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.
[74] Ajay Divakaran,et al. Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation , 2019, EMNLP.
[75] Chitta Baral,et al. Self-Supervised Test-Time Learning for Reading Comprehension , 2021, NAACL.
[76] Yejin Choi,et al. WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale , 2020, AAAI.
[77] Mario Fritz,et al. Towards a Visual Turing Challenge , 2014, ArXiv.
[78] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[79] Ahmed El Kholy,et al. UNITER: Learning UNiversal Image-TExt Representations , 2019, ECCV 2020.
[80] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[81] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[82] Matthieu Cord,et al. RUBi: Reducing Unimodal Biases in Visual Question Answering , 2019, NeurIPS.
[83] Ke Xu,et al. Harvesting and Refining Question-Answer Pairs for Unsupervised QA , 2020, ACL.
[84] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[85] Dietrich Klakow,et al. Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods , 2019, J. Artif. Intell. Res..
[86] Bolei Zhou,et al. Visual Question Generation as Dual Task of Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.