暂无分享,去创建一个
[1] Holger Schwenk,et al. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.
[2] Jordi Pont-Tuset,et al. Connecting Vision and Language with Localized Narratives , 2019, ECCV.
[3] Christopher D. Manning,et al. Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.
[4] Noah A. Smith. Contextual Word Representations: A Contextual Introduction , 2019, ArXiv.
[5] Dan Klein,et al. Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers , 2020, ArXiv.
[6] Donald E. Knuth,et al. The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .
[7] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..
[8] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Sanja Fidler,et al. What Are You Talking About? Text-to-Image Coreference , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[10] Hai Zhao,et al. Neural Machine Translation with Universal Visual Representation , 2020, ICLR.
[11] Alexander M. Rush,et al. What is Learned in Visually Grounded Neural Syntax Acquisition , 2020, ACL.
[12] Furu Wei,et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.
[13] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[14] Laure Soulier,et al. Incorporating Visual Semantics into Sentence Representations within a Grounded Space , 2019, EMNLP.
[15] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[16] Marie-Francine Moens,et al. Imagined Visual Representations as Multimodal Embeddings , 2017, AAAI.
[17] David Thomas,et al. The Art in Computer Programming , 2001 .
[18] Jianfeng Gao,et al. Unified Vision-Language Pre-Training for Image Captioning and VQA , 2020, AAAI.
[19] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[20] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[21] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[22] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[23] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[24] Stephen Clark,et al. Visual Bilingual Lexicon Induction with Transferred ConvNet Features , 2015, EMNLP.
[25] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[26] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[27] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[28] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[29] Jacob Andreas,et al. Experience Grounds Language , 2020, EMNLP.
[30] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[31] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[32] Gordon Christie,et al. Resolving Language and Vision Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution in Captioned Scenes , 2016, EMNLP.
[33] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.
[35] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[36] Ali Farhadi,et al. Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping , 2020, ArXiv.
[37] Yejin Choi,et al. SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.
[38] Allan Jabri,et al. Learning Visually Grounded Sentence Representations , 2018, NAACL.
[39] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[40] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.
[41] Quoc V. Le,et al. Semi-supervised Sequence Learning , 2015, NIPS.
[42] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[43] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[44] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[45] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[46] Alex Pentland,et al. Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..
[47] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.
[48] Emily M. Bender,et al. Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data , 2020, ACL.
[49] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.
[50] Angeliki Lazaridou,et al. Combining Language and Vision with a Multimodal Skip-gram Model , 2015, NAACL.
[51] Kevin Gimpel,et al. Visually Grounded Neural Syntax Acquisition , 2019, ACL.
[52] Stefano Ermon,et al. Learning and Inference via Maximum Inner Product Search , 2016, ICML.
[53] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[54] David Mimno,et al. Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents , 2019, EMNLP.
[55] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[56] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[57] Jeff Johnson,et al. Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.
[58] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[59] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[60] Licheng Yu,et al. UNITER: Learning UNiversal Image-TExt Representations , 2019, ArXiv.
[61] Lucia Specia,et al. Predicting Actions to Help Predict Translations , 2019, ArXiv.
[62] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[63] Demis Hassabis,et al. Grounded Language Learning in a Simulated 3D World , 2017, ArXiv.
[64] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[65] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[66] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[67] P. Bloom. How children learn the meanings of words , 2000 .
[68] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[69] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[70] Lucia Specia,et al. Distilling Translations with Visual Awareness , 2019, ACL.
[71] Marie-Francine Moens,et al. Multi-Modal Representations for Improved Bilingual Lexicon Learning , 2016, ACL.
[72] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[73] Radu Soricut,et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.
[74] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.
[75] N. Altman. An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .