Learning Transferable Visual Models From Natural Language Supervision
暂无分享,去创建一个
Ilya Sutskever | Alec Radford | Gabriel Goh | Chris Hallacy | Girish Sastry | Aditya Ramesh | Amanda Askell | Sandhini Agarwal | Jack Clark | Jong Wook Kim | Pamela Mishkin | Gretchen Krueger | Alec Radford | Ilya Sutskever | Chris Hallacy | A. Ramesh | Girish Sastry | Amanda Askell | Sandhini Agarwal | Gretchen Krueger | Jack Clark | Gabriel Goh | Pamela Mishkin | I. Sutskever
[1] Quoc V. Le,et al. Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.
[3] Joel Nothman,et al. SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.
[4] C. V. Jawahar,et al. Self-Supervised Learning of Visual Features through Embedding Images into Text Topic Spaces , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Larry S. Davis,et al. AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video , 2011, AVSS.
[6] Jean-Marc Odobez,et al. Topic models for scene analysis and abnormality detection , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.
[7] Andreas Dengel,et al. EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.
[8] Yahia Saeed Assiri. Stochastic Optimization of Plain Convolutional Neural Networks with Simple Methods , 2019, MLDM.
[9] Bohyung Han,et al. CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of Maps , 2018, ECCV.
[10] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[11] Cyrus Rashtchian,et al. Collecting Image Annotations Using Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.
[12] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[13] Adam Tauman Kalai,et al. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.
[14] Radu Soricut,et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.
[15] Jason Weston,et al. Dialog-based Language Learning , 2016, NIPS.
[16] Zhi Zhang,et al. Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Inioluwa Deborah Raji,et al. Saving Face: Investigating the Ethical Concerns of Facial Recognition Auditing , 2020, AIES.
[18] Fei-Fei Li,et al. Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[19] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[20] Tom M. Mitchell,et al. Joint Concept Learning and Semantic Parsing from Natural Language Explanations , 2017, EMNLP.
[21] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[22] Ivan Laptev,et al. RareAct: A video dataset of unusual interactions , 2020, ArXiv.
[23] Andrew Zisserman,et al. Self-Supervised MultiModal Versatile Networks , 2020, NeurIPS.
[24] Geoffrey C. Bowker,et al. Unsupervised by any other name: Hidden layers of knowledge production in artificial intelligence on social media , 2019, Big Data & Society.
[25] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[26] Joshua B. Tenenbaum,et al. Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.
[27] Mark Chen,et al. Generative Pretraining From Pixels , 2020, ICML.
[28] Noah D. Goodman,et al. Shaping Visual Representations with Language for Few-Shot Classification , 2019, ACL.
[29] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[30] Richard Socher,et al. Learned in Translation: Contextualized Word Vectors , 2017, NIPS.
[31] Xiaohua Zhai,et al. A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark , 2019 .
[32] Alexander D'Amour,et al. Underspecification Presents Challenges for Credibility in Modern Machine Learning , 2020, J. Mach. Learn. Res..
[33] Karan Desai,et al. VirTex: Learning Visual Representations from Textual Annotations , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Pietro Perona,et al. Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.
[35] Ali Razavi,et al. Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.
[36] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.
[37] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[38] Quoc V. Le,et al. Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Antonio Torralba,et al. Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .
[41] Andreas Griewank,et al. Algorithm 799: revolve: an implementation of checkpointing for the reverse or adjoint mode of computational differentiation , 2000, TOMS.
[42] K. Jarrod Millman,et al. Array programming with NumPy , 2020, Nat..
[43] Alexei A. Efros,et al. IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[44] Francesco Locatello,et al. A Sober Look at the Unsupervised Learning of Disentangled Representations and their Evaluation , 2020, J. Mach. Learn. Res..
[45] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .
[46] Douwe Kiela,et al. The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes , 2020, NeurIPS.
[47] Lin Su,et al. ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data , 2020, ArXiv.
[48] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[49] Benjamin Recht,et al. Measuring Robustness to Natural Distribution Shifts in Image Classification , 2020, NeurIPS.
[50] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.
[51] Xinlei Chen,et al. Towards VQA Models That Can Read , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[53] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[54] Lei Yu,et al. Learning and Evaluating General Linguistic Intelligence , 2019, ArXiv.
[55] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[56] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[57] Kimmo Kärkkäinen,et al. FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age , 2019, ArXiv.
[58] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[59] Mario Lucic,et al. Are GANs Created Equal? A Large-Scale Study , 2017, NeurIPS.
[60] Ilya Sutskever,et al. Jukebox: A Generative Model for Music , 2020, ArXiv.
[61] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[62] Xiaoqiang Lu,et al. Remote Sensing Image Scene Classification: Benchmark and State of the Art , 2017, Proceedings of the IEEE.
[63] Xinlei Chen,et al. Webly Supervised Learning of Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[64] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..
[65] Ahmed El Kholy,et al. UNITER: Learning UNiversal Image-TExt Representations , 2019, ECCV 2020.
[66] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[67] Danqi Chen,et al. Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL/IJCNLP.
[68] Richard Socher,et al. The Natural Language Decathlon: Multitask Learning as Question Answering , 2018, ArXiv.
[69] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[70] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[71] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[72] Christoph H. Lampert,et al. Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[73] Dawn Song,et al. Pretrained Transformers Improve Out-of-Distribution Robustness , 2020, ACL.
[74] Jason Weston,et al. Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.
[75] Kaiming He,et al. Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.
[76] Jason Weston,et al. Learning through Dialogue Interactions by Asking Questions , 2016, ICLR.
[77] Percy Liang,et al. ExpBERT: Representation Engineering with Natural Language Explanations , 2020, ACL.
[78] Phillip Isola,et al. Contrastive Multiview Coding , 2019, ECCV.
[79] Richard Zhang,et al. Making Convolutional Networks Shift-Invariant Again , 2019, ICML.
[80] Andrew Zisserman,et al. End-to-End Learning of Visual Representations From Uncurated Instructional Videos , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[81] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.
[82] Thomas G. Dietterich,et al. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.
[83] Sanja Fidler,et al. Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[84] Lina J. Karam,et al. A Study and Comparison of Human and Deep Learning Recognition Performance under Visual Distortions , 2017, 2017 26th International Conference on Computer Communication and Networks (ICCCN).
[85] Demis Hassabis,et al. Grounded Language Learning in a Simulated 3D World , 2017, ArXiv.
[86] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[87] Katja Markert,et al. Learning Models for Object Recognition from Natural Language Descriptions , 2009, BMVC.
[88] Andrew Zisserman,et al. A Short Note on the Kinetics-700 Human Action Dataset , 2019, ArXiv.
[89] Benjamin Recht,et al. The Effect of Natural Distribution Shift on Question Answering Models , 2020, ICML.
[90] Nathan Jacobs,et al. Revisiting IM2GPS in the Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[91] Yuxin Peng,et al. Fine-Grained Image Classification via Combining Vision and Language , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[92] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[93] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[94] Brian A. Nosek,et al. Harvesting implicit group attitudes and beliefs from a demonstration web site , 2002 .
[95] D. Song,et al. The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[96] Zhitao Gong,et al. Strike (With) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[97] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.
[98] Aleksander Madry,et al. Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.
[99] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[100] Y. Mori,et al. Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .
[101] Jiebo Luo,et al. TAP: Text-Aware Pre-training for Text-VQA and Text-Caption , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[102] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[103] Tal Linzen,et al. How Can We Accelerate Progress Towards Human-like Linguistic Generalization? , 2020, ACL.
[104] Nitish Srivastava,et al. Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..
[105] Matthias Bethge,et al. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.
[106] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[107] Quoc V. Le,et al. Semi-supervised Sequence Learning , 2015, NIPS.
[108] C. V. Jawahar,et al. Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[109] Sinan Kalkan,et al. Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition , 2020, ECCV Workshops.
[110] Christopher D. Manning,et al. Contrastive Learning of Medical Visual Representations from Paired Images and Text , 2020, MLHC.
[111] David J. Fleet,et al. VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.
[112] Boris Katz,et al. ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models , 2019, NeurIPS.
[113] Shruti Bhargava,et al. Exposing and Correcting the Gender Bias in Image Captioning Datasets and Models , 2019, ArXiv.
[114] Shoshana Zuboff,et al. Big other: surveillance capitalism and the prospects of an information civilization , 2015, J. Inf. Technol..
[115] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[116] Babak Saleh,et al. Write a Classifier: Zero-Shot Learning Using Purely Textual Descriptions , 2013, 2013 IEEE International Conference on Computer Vision.
[117] Yue Wang,et al. Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need? , 2020, ECCV.
[118] Ralph Ewerth,et al. Geolocation Estimation of Photos Using a Hierarchical Model and Scene Classification , 2018, ECCV.
[119] Trevor Darrell,et al. Learning Visual Representations using Images with Captions , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[120] Allan Jabri,et al. Learning Visual Features from Large Weakly Supervised Data , 2015, ECCV.
[121] Carly R. Knight,et al. Diagnosing Gender Bias in Image Recognition Systems , 2020, Socius : sociological research for a dynamic world.
[122] C. V. Jawahar,et al. Scene Text Recognition using Higher Order Language Priors , 2009, BMVC.
[123] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .
[124] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[125] Helen Yannakoudakis,et al. A Multimodal Framework for the Detection of Hateful Memes , 2020, ArXiv.
[126] Quoc V. Le,et al. Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.
[127] David A. Shamma,et al. YFCC100M , 2015, Commun. ACM.
[128] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[129] Nan Duan,et al. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training , 2019, AAAI.
[130] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[131] Fei-Fei Li,et al. Video Event Understanding Using Natural Language Descriptions , 2013, 2013 IEEE International Conference on Computer Vision.
[132] Allan Jabri,et al. Learning Visual N-Grams from Web Data , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[133] Christopher Ré,et al. Training Classifiers with Natural Language Explanations , 2018, ACL.
[134] Yoshua Bengio,et al. Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.
[135] Armand Joulin,et al. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.
[136] Andrew Y. Ng,et al. Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.
[137] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[138] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[139] Colin Raffel,et al. Realistic Evaluation of Deep Semi-Supervised Learning Algorithms , 2018, NeurIPS.
[140] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[141] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.
[142] David A. Ross,et al. Learning Video Representations from Textual Web Supervision , 2020, ArXiv.
[143] Michal Valko,et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.
[144] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.
[145] Jaehoon Lee,et al. On Empirical Comparisons of Optimizers for Deep Learning , 2019, ArXiv.
[146] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[147] R. Gonzales. Dark matters: on the surveillance of blackness , 2016 .
[148] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[149] Kenneth Ward Church,et al. Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.
[150] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.
[151] Yoshua Bengio,et al. Zero-data Learning of New Tasks , 2008, AAAI.
[152] J. Overhage,et al. Sorting Things Out: Classification and Its Consequences , 2001, Annals of Internal Medicine.
[153] Ivan Laptev,et al. HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[154] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[155] Regina Barzilay,et al. Language Understanding for Text-based Games using Deep Reinforcement Learning , 2015, EMNLP.
[156] Benjamin Recht,et al. Do Image Classifiers Generalize Across Time , 2019 .
[157] Yu Cheng,et al. Large-Scale Adversarial Training for Vision-and-Language Representation Learning , 2020, NeurIPS.
[158] Max Welling,et al. Rotation Equivariant CNNs for Digital Pathology , 2018, MICCAI.
[159] Andrew Zisserman,et al. Deep Structured Output Learning for Unconstrained Text Recognition , 2014, ICLR.
[160] Alec Radford,et al. Release Strategies and the Social Impacts of Language Models , 2019, ArXiv.
[161] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[162] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[163] Dahua Lin,et al. Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination , 2018, ArXiv.
[164] Ilya Kostrikov,et al. PlaNet - Photo Geolocation with Convolutional Neural Networks , 2016, ECCV.
[165] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[166] David A. Forsyth,et al. Matching Words and Pictures , 2003, J. Mach. Learn. Res..
[167] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[168] Mohammad Norouzi,et al. Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.
[169] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.
[170] Lukasz Kaiser,et al. Generating Wikipedia by Summarizing Long Sequences , 2018, ICLR.
[171] Timnit Gebru,et al. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.
[172] D. Fitch,et al. Review of "Algorithms of oppression: how search engines reinforce racism," by Noble, S. U. (2018). New York, New York: NYU Press. , 2018, CDQR.
[173] Julien Perez,et al. Learning Visual Representations with Caption Annotations , 2020, ECCV.
[174] Benjamin Recht,et al. Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.
[175] Yang Yang,et al. Deep Learning Scaling is Predictable, Empirically , 2017, ArXiv.
[176] M. Bethge,et al. Shortcut learning in deep neural networks , 2020, Nature Machine Intelligence.
[177] Johannes Stallkamp,et al. The German Traffic Sign Recognition Benchmark: A multi-class classification competition , 2011, The 2011 International Joint Conference on Neural Networks.
[178] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..
[179] Alexander Kuhnle,et al. ShapeWorld - A new test methodology for multimodal language understanding , 2017, ArXiv.
[180] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[181] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[182] Zhou Yu,et al. ALICE: Active Learning with Contrastive Natural Language Explanations , 2020, EMNLP.
[183] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.
[184] Hao Wang,et al. All You Need Is Boundary: Toward Arbitrary-Shaped Text Spotting , 2019, AAAI.
[185] R Devon Hjelm,et al. Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.
[186] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[187] Kihyuk Sohn,et al. Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.
[188] Tianqi Chen,et al. Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.
[189] Li Fei-Fei,et al. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[190] Ali Farhadi,et al. Learning Everything about Anything: Webly-Supervised Visual Concept Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[191] Amit K. Roy-Chowdhury,et al. Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval , 2018, ACM Multimedia.
[192] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[193] Eric P. Xing,et al. Learning Robust Global Representations by Penalizing Local Predictive Power , 2019, NeurIPS.
[194] Matthijs Douze,et al. Fixing the train-test resolution discrepancy , 2019, NeurIPS.
[195] Dan Klein,et al. Learning with Latent Language , 2017, NAACL.
[196] A. Linear-probe,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021 .
[197] Jason Weston,et al. Learning from Dialogue after Deployment: Feed Yourself, Chatbot! , 2019, ACL.
[198] Hao Tian,et al. ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph , 2020, AAAI.