论文信息 - DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning

DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning

Self-supervised learning algorithms, including BERT and SimCLR, have enabled 1 significant strides in fields like natural language processing, computer vision, and 2 speech processing. However, the domain-specificity of these algorithms means that 3 solutions must be handcrafted for each new setting, including myriad healthcare, 4 scientific, and multimodal domains. To catalyze progress towards more domain5 agnostic methods, we introduce DABS: a Domain-Agnostic Benchmark for Self6 supervised learning. To perform well on DABS, an algorithm must be pretrained 7 on six unlabeled datasets from diverse domains: natural images, text, speech 8 recordings, medical imaging, multichannel sensor data, and paired text and images, 9 and then perform well on a set of labeled tasks in each domain. We also present 10 e-Mix and ShED: two baseline domain-agnostic algorithms; their relatively modest 11 performance demonstrates that significant progress is needed before self-supervised 12 learning is an out-of-the-box solution for arbitrary domains. Code for benchmark 13 datasets and baseline algorithms is available at [redacted]. 14 Figure 1: The DABS Benchmark. A domain-agnostic self-supervised algorithm consists of 1) a model architecture, 2) an objective used to pretrain the model on unlabeled data, and 3) a transfer method used to deploy it on a downstream task (bolded items). A successful algorithm will achieve high performance on downstream tasks while holding these components constant across domains. †atamkin@stanford.edu Submitted to the 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks. Do not distribute.

Christian Igel | Alec Radford | Heewoo Jun | Chris Chute

[1] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[2] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.

[3] Klaus-Robert Müller,et al. Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals , 2018, ArXiv.

[4] Hector J. Levesque,et al. The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.

[5] Andrew Zisserman,et al. Perceiver: General Perception with Iterative Attention , 2021, ICML.

[6] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.

[7] Rich Caruana,et al. Learning Many Related Tasks at the Same Time with Backpropagation , 1994, NIPS.

[8] Yifan Yu,et al. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison , 2019, AAAI.

[9] Yoshua Bengio,et al. Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[10] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[11] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .

[12] Michal Valko,et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[13] Johannes Stallkamp,et al. Detection of traffic signs in real-world images: The German traffic sign detection benchmark , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[14] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.

[15] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[16] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[17] Chris Brockett,et al. Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.

[18] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Yoav Goldberg,et al. BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models , 2021, ACL.

[20] Subhransu Maji,et al. Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[21] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[22] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[23] Jinwoo Shin,et al. i-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning , 2021, ICLR.

[24] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[25] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26] J. Leskovec,et al. Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[27] Jure Leskovec,et al. Strategies for Pre-training Graph Neural Networks , 2020, ICLR.

[28] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[29] Shinji Watanabe,et al. SUPERB: Speech processing Universal PERformance Benchmark , 2021, Interspeech.

[30] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31] Vijay S. Pande,et al. MoleculeNet: a benchmark for molecular machine learning , 2017, Chemical science.

[32] Aren Jansen,et al. Towards Learning a Universal Non-Semantic Representation of Speech , 2020, INTERSPEECH.

[33] Andrew Zisserman,et al. Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[34] Jon Kleinberg,et al. Transfusion: Understanding Transfer Learning for Medical Imaging , 2019, NeurIPS.

[35] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[36] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[37] Zhangyang Wang,et al. Graph Contrastive Learning with Augmentations , 2020, NeurIPS.

[38] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[39] Noah Goodman,et al. Investigating Transferability in Pretrained Language Models , 2020, EMNLP.

[40] S. Carey. The Origin of Concepts , 2000 .

[41] Sergey Levine,et al. Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[42] Pete Warden,et al. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 2018, ArXiv.

[43] Percy Liang,et al. Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.

[44] Noah D. Goodman,et al. The double-edged sword of pedagogy: Instruction limits spontaneous exploration and discovery , 2011, Cognition.

[45] Christopher A. Hunter,et al. Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction , 2018, ACS central science.

[46] Brian Lester,et al. The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.

[47] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[48] Yaser S. Abu-Mostafa,et al. Learning from hints in neural networks , 1990, J. Complex..

[49] Mike Wu,et al. Viewmaker Networks: Learning Views for Unsupervised Representation Learning , 2020, ArXiv.

[50] Didier Stricker,et al. Introducing a New Benchmarked Dataset for Activity Monitoring , 2012, 2012 16th International Symposium on Wearable Computers.

[51] Richard Socher,et al. Learned in Translation: Contextualized Word Vectors , 2017, NIPS.

[52] Xiaohua Zhai,et al. A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark , 2019 .

[53] Matthew P. Lungren,et al. Structured dataset documentation: a datasheet for CheXpert , 2021, ArXiv.

[54] Hiroshi Iida,et al. TABBIE: Pretrained Representations of Tabular Data , 2021, NAACL.

[55] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[56] Hung-yi Lee,et al. Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[57] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.

[58] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .

[59] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[60] David J. Schwab,et al. Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs , 2020, ICLR.

[61] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[62] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[63] Yann LeCun,et al. Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[64] Roy Bar-Haim,et al. The Second PASCAL Recognising Textual Entailment Challenge , 2006 .

[65] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[66] Iasonas Kokkinos,et al. Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[67] Yann LeCun,et al. Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[68] Hugo Larochelle,et al. Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples , 2019, ICLR.

[69] Mark Chen,et al. Scaling Laws for Autoregressive Generative Modeling , 2020, ArXiv.

[70] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.

[71] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[72] Alexei A. Efros,et al. Investigating Human Priors for Playing Video Games , 2018, ICML.

[73] Dan Jurafsky,et al. Utility Is in the Eye of the User: A Critique of NLP Leaderboard Design , 2020, EMNLP.

[74] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[75] Stella X. Yu,et al. Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[76] Leonidas J. Guibas,et al. Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.