论文信息 - Exploring the Limits of Out-of-Distribution Detection

Exploring the Limits of Out-of-Distribution Detection

Near out-of-distribution detection (OOD) is a major challenge for deep neural networks. We demonstrate that large-scale pre-trained transformers can significantly improve the state-of-the-art (SOTA) on a range of near OOD tasks across different data modalities. For instance, on CIFAR-100 vs CIFAR-10 OOD detection, we improve the AUROC from 85% (current SOTA) to 96% using Vision Transformers pre-trained on ImageNet-21k. On a challenging genomics OOD detection benchmark, we improve the AUROC from 66% to 77% using transformers and unsupervised pre-training. To further improve performance, we explore the few-shot outlier exposure setting where a few examples from outlier classes may be available; we show that pre-trained transformers are particularly well-suited for outlier exposure, and that the AUROC of OOD detection on CIFAR-100 vs CIFAR10 can be improved to 98.7% with just 1 image per OOD class, and 99.46% with 10 images per OOD class. For multi-modal image-text pre-trained transformers such as CLIP, we explore a new way of using just the names of outlier classes as a sole source of information without any accompanying images, and show that this outperforms previous SOTA on standard vision OOD benchmark tasks.

[1] Pushmeet Kohli,et al. Contrastive Training for Improved Out-of-Distribution Detection , 2020, ArXiv.

[2] Kimin Lee,et al. Using Pre-Training Can Improve Model Robustness and Uncertainty , 2019, ICML.

[3] Andreas Veit,et al. Understanding Robustness of Transformers for Image Classification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[4] Mark A. Ragan,et al. Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer , 2016, Scientific Reports.

[5] Julien Mairal,et al. Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6] Gesine Reinert,et al. Alignment-Free Sequence Comparison (I): Statistics and Power , 2009, J. Comput. Biol..

[7] M. Ragan,et al. Inferring phylogenies of evolving sequences without multiple sequence alignment , 2014, Scientific Reports.

[8] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[9] Dawn Song,et al. Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty , 2019, NeurIPS.

[10] Burkhard Rost,et al. Embeddings from deep learning transfer GO annotations beyond homology , 2020, Scientific Reports.

[11] Kevin Gimpel,et al. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[12] Alexander Kolesnikov,et al. MLP-Mixer: An all-MLP Architecture for Vision , 2021, NeurIPS.

[13] Dawn Song,et al. Pretrained Transformers Improve Out-of-Distribution Robustness , 2020, ACL.

[14] Lucas Beyer,et al. Big Transfer (BiT): General Visual Representation Learning , 2020, ECCV.

[15] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[16] Zhihan Zhou,et al. DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome , 2020, bioRxiv.

[17] Bolei Zhou,et al. Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Pin-Yu Chen,et al. Vision Transformers are Robust Learners , 2021, AAAI.

[19] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[20] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Yee Whye Teh,et al. Hybrid Models with Deep and Invertible Features , 2019, ICML.

[22] Marten van Dijk,et al. On the Robustness of Vision Transformers to Adversarial Examples , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[23] Iasonas Kokkinos,et al. Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Myle Ott,et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences , 2019, Proceedings of the National Academy of Sciences.

[25] Ang Li,et al. Hybrid Models for Open Set Recognition , 2020, ECCV.

[26] Xiaohua Zhai,et al. Revisiting the Calibration of Modern Neural Networks , 2021, NeurIPS.

[27] Thomas G. Dietterich,et al. Deep Anomaly Detection with Outlier Exposure , 2018, ICLR.

[28] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[29] B. Rost,et al. ProtTrans: Towards Cracking the Language of Lifes Code Through Self-Supervised Deep Learning and High Performance Computing. , 2021, IEEE transactions on pattern analysis and machine intelligence.

[30] Yuan He,et al. Towards Robust Vision Transformer , 2021, ArXiv.

[31] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.

[32] Alexander Kolesnikov,et al. How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers , 2021, ArXiv.

[33] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .

[34] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[35] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[36] Gesine Reinert,et al. Alignment-Free Sequence Analysis and Applications. , 2018, Annual review of biomedical data science.

[37] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.

[38] M. Bethge,et al. Shortcut learning in deep neural networks , 2020, Nature Machine Intelligence.

[39] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[40] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41] Dustin Tran,et al. Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness , 2020, NeurIPS.

[42] Jasper Snoek,et al. Likelihood Ratios for Out-of-Distribution Detection , 2019, NeurIPS.

[43] Chandramouli Shama Sastry,et al. Detecting Out-of-Distribution Examples with Gram Matrices , 2020, ICML.

[44] Jinfeng Yi,et al. On the Adversarial Robustness of Visual Transformers , 2021, ArXiv.

[45] Quoc V. Le,et al. Semi-supervised Sequence Learning , 2015, NIPS.

[46] Jason Yosinski,et al. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[48] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[49] Kibok Lee,et al. A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[50] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[51] Noam Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, ArXiv.

[52] R. Srikant,et al. Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[53] Lihi Zelnik-Manor,et al. ImageNet-21K Pretraining for the Masses , 2021, NeurIPS Datasets and Benchmarks.