暂无分享,去创建一个
Sicheng Yang | Dandan Song | Lejian Liao | Siyi Ma | Zhanchen Sun | L. Liao | Dandan Song | Sicheng Yang | S. Ma | Zhanchen Sun
[1] Matthieu Cord,et al. MUREL: Multimodal Relational Reasoning for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Jason Weston,et al. Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.
[3] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[4] Alexander Schwing,et al. Fast, Diverse and Accurate Image Captioning Guided by Part-Of-Speech , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Yu Cheng,et al. Large-Scale Adversarial Training for Vision-and-Language Representation Learning , 2020, NeurIPS.
[6] Furu Wei,et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.
[7] Yang Feng,et al. Unsupervised Image Captioning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[9] Radu Soricut,et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.
[10] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[11] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[12] Nong Xiao,et al. Heterogeneous Graph Learning for Visual Commonsense Reasoning , 2019, NeurIPS.
[13] Larry S. Davis,et al. Explicit Bias Discovery in Visual Question Answering Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Chao Wang,et al. Explicit Utilization of General Knowledge in Machine Reading Comprehension , 2018, ACL.
[15] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[16] Yuxing Tang,et al. Large Scale Semi-Supervised Object Detection Using Visual and Semantic Knowledge Transfer , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Tom Sercu,et al. Adversarial Semantic Alignment for Improved Image Captions , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Sanja Fidler,et al. Towards Diverse and Natural Image Descriptions via a Conditional GAN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[19] Hao Tian,et al. ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph , 2020, AAAI.
[20] Yi Yang,et al. Connective Cognition Network for Directional Visual Commonsense Reasoning , 2019, NeurIPS.
[21] Ahmed El Kholy,et al. UNITER: Learning UNiversal Image-TExt Representations , 2019, ECCV 2020.
[22] Ali Farhadi,et al. From Recognition to Cognition: Visual Commonsense Reasoning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[25] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[26] Hung-yi Lee,et al. DyKgChat: Benchmarking Dialogue Generation Grounding on Dynamic Knowledge Graphs , 2019, EMNLP.
[27] Todor Mihaylov,et al. Knowledgeable Reader: Enhancing Cloze-Style Reading Comprehension with External Commonsense Knowledge , 2018, ACL.
[28] Nan Duan,et al. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training , 2019, AAAI.
[29] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[30] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[31] Zhe Zhao,et al. K-BERT: Enabling Language Representation with Knowledge Graph , 2019, AAAI.
[32] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.
[33] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[34] Yang Feng,et al. Knowledge Diffusion for Neural Dialogue Generation , 2018, ACL.
[35] Chen Chen,et al. Improving Image Captioning with Conditional Generative Adversarial Nets , 2018, AAAI.
[36] Peng Gao,et al. Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[37] David Reitter,et al. Fusion of Detected Objects in Text for Visual Question Answering , 2019, EMNLP.
[38] Ali Farhadi,et al. Are Elephants Bigger than Butterflies? Reasoning about Sizes of Objects , 2016, AAAI.
[39] Yichen Wei,et al. Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[40] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[41] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[42] Zhou Su,et al. Learning Visual Knowledge Memory Networks for Visual Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[43] Catherine Havasi,et al. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.