暂无分享,去创建一个
[1] Yoav Artzi,et al. A Corpus for Reasoning about Natural Language Grounded in Photographs , 2018, ACL.
[2] Hongxia Jin,et al. Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[3] Zhiwu Lu,et al. Counterfactual VQA: A Cause-Effect Look at Language Bias , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Yu Cheng,et al. Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models , 2020, ECCV.
[5] Bernard Ghanem,et al. FLAG: Adversarial Data Augmentation for Graph Neural Networks , 2020, ArXiv.
[6] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[7] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.
[8] David Reitter,et al. Fusion of Detected Objects in Text for Visual Question Answering , 2019, EMNLP.
[9] Eric Horvitz,et al. SQuINTing at VQA Models: Interrogating VQA Models with Sub-Questions , 2020, ArXiv.
[10] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[11] Chitta Baral,et al. VQA-LOL: Visual Question Answering under the Lens of Logic , 2020, ECCV.
[12] Jianfeng Gao,et al. Adversarial Training for Large Neural Language Models , 2020, ArXiv.
[13] Anton van den Hengel,et al. Unshuffling Data for Improved Generalization , 2020, ArXiv.
[14] Ajay Divakaran,et al. Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation , 2019, EMNLP.
[15] Bernt Schiele,et al. Adversarial Scene Editing: Automatic Object Removal from Weak Supervision , 2018, NeurIPS.
[16] Alan L. Yuille,et al. Feature Denoising for Improving Adversarial Robustness , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Dan Boneh,et al. Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.
[18] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Anton van den Hengel,et al. Counterfactual Vision and Language Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[21] Eric Horvitz,et al. SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Yonatan Belinkov,et al. Adversarial Regularization for Visual Question Answering: Strengths, Shortcomings, and Side Effects , 2019, Proceedings of the Second Workshop on Shortcomings in Vision and Language.
[23] Nan Duan,et al. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training , 2019, AAAI.
[24] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[25] Yu Cheng,et al. FreeLB: Enhanced Adversarial Training for Natural Language Understanding , 2020, ICLR.
[26] Ali Farhadi,et al. From Recognition to Cognition: Visual Commonsense Reasoning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Zhe Gan,et al. HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training , 2020, EMNLP.
[28] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.
[29] Jianfeng Gao,et al. Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-Training , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[31] Cordelia Schmid,et al. Learning Video Representations using Contrastive Bidirectional Transformer , 2019 .
[32] Jianfeng Gao,et al. VIVO: Surpassing Human Performance in Novel Object Captioning with Visual Vocabulary Pre-Training , 2020, ArXiv.
[33] J. Zico Kolter,et al. Fast is better than free: Revisiting adversarial training , 2020, ICLR.
[34] Zhou Yu,et al. Deep Modular Co-Attention Networks for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Dhruv Batra,et al. Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[36] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[37] Michael I. Jordan,et al. Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.
[38] Xi Chen,et al. Stacked Cross Attention for Image-Text Matching , 2018, ECCV.
[39] Larry S. Davis,et al. Adversarial Training for Free! , 2019, NeurIPS.
[40] Christian Wolf,et al. Roses are Red, Violets are Blue… But Should VQA expect Them To? , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Matthieu Cord,et al. RUBi: Reducing Unimodal Biases in Visual Question Answering , 2019, NeurIPS.
[42] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[43] Vahid Kazemi,et al. Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering , 2017, ArXiv.
[44] Luke Zettlemoyer,et al. Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases , 2019, EMNLP.
[45] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[46] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.
[47] Anton van den Hengel,et al. On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law , 2020, NeurIPS.
[48] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.
[49] Yi Yang,et al. ActBERT: Learning Global-Local Video-Text Representations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Arjun Majumdar,et al. Improving Vision-and-Language Navigation with Image-Text Pairs from the Web , 2020, ECCV.
[51] Myle Ott,et al. Scaling Neural Machine Translation , 2018, WMT.
[52] Jianlong Fu,et al. Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers , 2020, ArXiv.
[53] Raymond J. Mooney,et al. Self-Critical Reasoning for Robust Visual Question Answering , 2019, NeurIPS.
[54] Christopher D. Manning,et al. GQA: a new dataset for compositional question answering over real-world images , 2019, ArXiv.
[55] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.
[56] Yunde Jia,et al. Overcoming Language Priors in VQA via Decomposed Linguistic Representations , 2020, AAAI.
[57] Anurag Mittal,et al. Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder , 2020, ECCV.
[58] Peng Gao,et al. Contrastive Visual-Linguistic Pretraining , 2020, ArXiv.
[59] Abhishek Das,et al. Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline , 2020, ECCV.
[60] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[61] Christopher D. Manning,et al. Learning by Abstraction: The Neural State Machine , 2019, NeurIPS.
[62] Anton van den Hengel,et al. Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision , 2020, ECCV.
[63] Hongxia Yang,et al. InterBERT: Vision-and-Language Interaction for Multi-modal Pretraining , 2020, ArXiv.
[64] Furu Wei,et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.
[65] Jianfeng Gao,et al. Unified Vision-Language Pre-Training for Image Captioning and VQA , 2020, AAAI.
[66] Shiliang Pu,et al. Counterfactual Samples Synthesizing for Robust Visual Question Answering , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[67] Shih-Fu Chang,et al. Weakly-supervised VisualBERT: Pre-training without Parallel Images and Captions , 2020, ArXiv.
[68] Quoc V. Le,et al. Adversarial Examples Improve Image Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[69] Yu Cheng,et al. Large-Scale Adversarial Training for Vision-and-Language Representation Learning , 2020, NeurIPS.
[70] Marcus Rohrbach,et al. 12-in-1: Multi-Task Vision and Language Representation Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[71] Xinlei Chen,et al. Pythia v0.1: the Winning Entry to the VQA Challenge 2018 , 2018, ArXiv.
[72] Jingren Zhou,et al. InterBERT: An Effective Multi-Modal Pretraining Approach via Vision-and-Language Interaction , 2020 .
[73] Wenhu Chen,et al. Meta Module Network for Compositional Visual Reasoning , 2019, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).
[74] Martin J. Wainwright,et al. Randomized Smoothing for Stochastic Optimization , 2011, SIAM J. Optim..
[75] Yue Wang,et al. VD-BERT: A Unified Vision and Dialog Transformer with BERT , 2020, EMNLP.
[76] Xiaodong Liu,et al. SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization , 2020, ACL.
[77] Licheng Yu,et al. Modeling Context in Referring Expressions , 2016, ECCV.
[78] Mario Fritz,et al. Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[79] Matthias Bethge,et al. A Simple Way to Make Neural Networks Robust Against Diverse Image Corruptions , 2020, ECCV.
[80] Chitta Baral,et al. MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering , 2020, EMNLP.
[81] Wei Emma Zhang,et al. Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering , 2020, ECCV.
[82] Hao Tian,et al. ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph , 2020, ArXiv.
[83] Hang Su,et al. Boosting Adversarial Training with Hypersphere Embedding , 2020, NeurIPS.