Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
暂无分享,去创建一个
[1] L. Carin,et al. Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-Training , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[3] Hanwang Zhang,et al. Two Causal Principles for Improving Visual Dialog , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Xingxing Zhang,et al. DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue , 2019, AAAI.
[5] Jianfeng Gao,et al. DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation , 2019, ACL.
[6] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[7] Peter J. Liu,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[8] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[9] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[10] Ahmed El Kholy,et al. UNITER: Learning UNiversal Image-TExt Representations , 2019, ECCV 2020.
[11] Abhishek Das,et al. Improving Generative Visual Dialog by Answering Diverse Questions , 2019, EMNLP.
[12] Hal Daumé,et al. Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning , 2019, EMNLP.
[13] Furu Wei,et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.
[14] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[15] Nan Duan,et al. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training , 2019, AAAI.
[16] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.
[17] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[18] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[19] Jesse Thomason,et al. Vision-and-Dialog Navigation , 2019, CoRL.
[20] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[21] Song-Chun Zhu,et al. Reasoning Visual Dialogs With Structural and Partial Observations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Tamir Hazan,et al. Factor Graph Attention , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[24] José M. F. Moura,et al. CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog , 2019, NAACL.
[25] Dacheng Tao,et al. Image-Question-Answer Synergistic Network for Visual Dialog , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Zheng-Jun Zha,et al. Making History Matter: Gold-Critic Sequence Training for Visual Dialog , 2019, ArXiv.
[27] Zheng-Jun Zha,et al. Making History Matter: History-Advantage Sequence Training for Visual Dialog , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[28] Jung-Woo Ha,et al. Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation , 2019, ICLR.
[29] Yu Cheng,et al. Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog , 2019, ACL.
[30] Anoop Cherian,et al. Audio Visual Scene-Aware Dialog , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Thomas Wolf,et al. TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents , 2019, ArXiv.
[32] Asim Kadav,et al. Visual Entailment: A Novel Task for Fine-Grained Image Understanding , 2019, ArXiv.
[33] Zhiwu Lu,et al. Recursive Visual Attention in Visual Dialog , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Ali Farhadi,et al. From Recognition to Cognition: Visual Commonsense Reasoning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Yoav Artzi,et al. A Corpus for Reasoning about Natural Language Grounded in Photographs , 2018, ACL.
[36] Raffaella Bernardi,et al. Beyond task success: A closer look at jointly learning to see, ask, and GuessWhat , 2018, NAACL.
[37] José M. F. Moura,et al. Visual Coreference Resolution in Visual Dialog using Neural Module Networks , 2018, ECCV.
[38] Jason Weston,et al. Talk the Walk: Navigating New York City through Grounded Dialogue , 2018, ArXiv.
[39] Radu Soricut,et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.
[40] Samuel R. Bowman,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[41] Svetlana Lazebnik,et al. Two Can Play This Game: Visual Dialog with Discriminative Question Generation and Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[42] Xi Chen,et al. Stacked Cross Attention for Image-Text Matching , 2018, ECCV.
[43] Philip H. S. Torr,et al. FLIPDIAL: A Generative Model for Two-Way Visual Dialogue , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[44] Qi Wu,et al. Are You Talking to Me? Reasoned Visual Dialog Generation Through Adversarial Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[45] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[46] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[47] Jiasen Lu,et al. Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model , 2017, NIPS.
[48] Stefan Lee,et al. Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[49] Olivier Pietquin,et al. End-to-end optimization of goal-driven and visually grounded dialogue systems , 2017, IJCAI.
[50] José M. F. Moura,et al. Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Hugo Larochelle,et al. GuessWhat?! Visual Object Discovery through Multi-modal Dialogue , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[53] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[55] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[56] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[57] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[58] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.
[59] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[60] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[61] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[62] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[63] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[64] Byoung-Tak Zhang,et al. Dual Attention Networks for Visual Reference Resolution in Visual Dialog , 2019, EMNLP.
[65] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.