暂无分享,去创建一个
[1] Gang Wang,et al. Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[2] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[3] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[4] Chitta Baral,et al. MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering , 2020, EMNLP.
[5] Alexander G. Schwing,et al. Creativity: Generating Diverse Questions Using Variational Autoencoders , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Zhiwu Lu,et al. Counterfactual VQA: A Cause-Effect Look at Language Bias , 2020, Computer Vision and Pattern Recognition.
[7] Shiliang Pu,et al. Counterfactual Samples Synthesizing for Robust Visual Question Answering , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Anton van den Hengel,et al. Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[9] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[10] Ying Wu,et al. TA-Student VQA: Multi-Agents Training by Self-Questioning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Wei Emma Zhang,et al. Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering , 2020, ECCV.
[12] Raymond J. Mooney,et al. Self-Critical Reasoning for Robust Visual Question Answering , 2019, NeurIPS.
[13] Matthieu Cord,et al. RUBi: Reducing Unimodal Biases in Visual Question Answering , 2019, NeurIPS.
[14] Dong Bok Lee,et al. Generating Diverse and Consistent QA pairs from Contexts with Information-Maximizing Hierarchical Conditional VAEs , 2020, ACL.
[15] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[16] Xiaogang Wang,et al. CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[17] Dietrich Klakow,et al. Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods , 2019, J. Artif. Intell. Res..
[18] Bolei Zhou,et al. Visual Question Generation as Dual Task of Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[19] Phil Blunsom,et al. Language as a Latent Variable: Discrete Generative Models for Sentence Compression , 2016, EMNLP.
[20] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[21] Michael S. Bernstein,et al. Information Maximizing Visual Question Generation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[23] Eric P. Xing,et al. Toward Controlled Generation of Text , 2017, ICML.
[24] Dhruv Batra,et al. Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[25] Sang-goo Lee,et al. Data Augmentation for Spoken Language Understanding via Joint Variational Generation , 2018, AAAI.
[26] Phil Blunsom,et al. Neural Variational Inference for Text Processing , 2015, ICML.
[27] Lucia Specia,et al. Guiding Visual Question Generation , 2021, ArXiv.
[28] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[29] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[30] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Xinlei Chen,et al. Cycle-Consistency for Robust Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Roger Zimmermann,et al. Emerging Trends of Multimodal Research in Vision and Language , 2020, ArXiv.
[33] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[34] Danqi Chen,et al. of the Association for Computational Linguistics: , 2001 .
[35] Devi Parikh,et al. Contrast and Classify: Alternate Training for Robust VQA , 2020, ArXiv.
[36] Ajay Divakaran,et al. Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation , 2019, EMNLP.
[37] Xinlei Chen,et al. Pythia v0.1: the Winning Entry to the VQA Challenge 2018 , 2018, ArXiv.
[38] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[39] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Luke Zettlemoyer,et al. Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases , 2019, EMNLP.
[41] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[42] Sanja Fidler,et al. Order-Embeddings of Images and Language , 2015, ICLR.
[43] Yin Li,et al. Learning Deep Structure-Preserving Image-Text Embeddings , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Yu Cheng,et al. Large-Scale Adversarial Training for Vision-and-Language Representation Learning , 2020, NeurIPS.
[45] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.
[46] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[47] Margaret Mitchell,et al. Generating Natural Questions About an Image , 2016, ACL.
[48] Jiasen Lu,et al. Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.
[49] Karol Gregor,et al. Neural Variational Inference and Learning in Belief Networks , 2014, ICML.
[50] Yongdong Zhang,et al. Overcoming Language Priors with Self-supervised Learning for Visual Question Answering , 2020, IJCAI.
[51] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[52] Quoc V. Le,et al. Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.
[53] Christopher Kanan,et al. Data Augmentation for Visual Question Answering , 2017, INLG.
[54] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[55] Yijia Liu,et al. Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding , 2018, COLING.
[56] Jacob Andreas,et al. Experience Grounds Language , 2020, EMNLP.
[57] Mario Fritz,et al. Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[58] Furu Wei,et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.
[59] Louis-Philippe Morency,et al. Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.