Robust Out-of-distribution Detection in Neural Networks

Detecting anomalous inputs is critical for safely deploying deep learning models in the real world. Existing approaches for detecting out-of-distribution (OOD) examples work well when evaluated on natural samples drawn from a sufficiently different distribution than the training data distribution. However, in this paper, we show that existing detection mechanisms can be extremely brittle when evaluating on inputs with minimal adversarial perturbations which don't change their semantics. Formally, we introduce a novel and challenging problem, Robust Out-of-Distribution Detection, and propose an algorithm that can fool existing OOD detectors by adding small perturbations to the inputs while preserving their semantics and thus the distributional membership. We take a first step to solve this challenge, and propose an effective algorithm called ALOE, which performs robust training by exposing the model to both adversarially crafted inlier and outlier examples. Our method can be flexibly combined with, and render existing methods robust. On common benchmark datasets, we show that ALOE substantially improves the robustness of state-of-the-art OOD detection, with 58.4% AUROC improvement on CIFAR-10 and 46.59% improvement on CIFAR-100. Finally, we provide theoretical analysis for our method, underpinning the empirical results above.

[1]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[2]  Kibok Lee,et al.  Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples , 2017, ICLR.

[3]  Abubakar Abid,et al.  Interpretation of Neural Networks is Fragile , 2017, AAAI.

[4]  Zhangyang Wang,et al.  Self-Supervised Learning for Generalizable Out-of-Distribution Detection , 2020, AAAI.

[5]  Matthias Hein,et al.  Towards neural networks that provably know when they don't know , 2020, ICLR.

[6]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[9]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[10]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[11]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[12]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[13]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[14]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[15]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[16]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[17]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[18]  Somesh Jha,et al.  Robust Attribution Regularization , 2019, NeurIPS.

[19]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[20]  E. Tabak,et al.  A Family of Nonparametric Density Estimation Algorithms , 2013 .

[21]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[22]  Eric Jang,et al.  Generative Ensembles for Robust Anomaly Detection , 2018, ArXiv.

[23]  Jasper Snoek,et al.  Likelihood Ratios for Out-of-Distribution Detection , 2019, NeurIPS.

[24]  Mung Chiang,et al.  Analyzing the Robustness of Open-World Machine Learning , 2019, AISec@CCS.

[25]  Mark J. F. Gales,et al.  Predictive Uncertainty Estimation via Prior Networks , 2018, NeurIPS.

[26]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[27]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[28]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[29]  R. Venkatesh Babu,et al.  Confidence estimation in Deep Neural networks via density modelling , 2017, ArXiv.

[30]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[31]  Jordi Luque,et al.  Input complexity and out-of-distribution detection with likelihood-based generative models , 2020, ICLR.

[32]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[34]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[35]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[36]  Thomas G. Dietterich,et al.  Deep Anomaly Detection with Outlier Exposure , 2018, ICLR.

[37]  Marin Orsic,et al.  Discriminative out-of-distribution detection for semantic segmentation , 2018, ArXiv.

[38]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[40]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[41]  Yee Whye Teh,et al.  Do Deep Generative Models Know What They Don't Know? , 2018, ICLR.

[42]  Soheil Feizi,et al.  Adversarial Robustness of Flow-Based Generative Models , 2019, AISTATS.

[43]  Johannes Stallkamp,et al.  Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition , 2012, Neural Networks.

[44]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[45]  Pingmei Xu,et al.  TurkerGaze: Crowdsourcing Saliency with Webcam based Eye Tracking , 2015, ArXiv.

[46]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Takaya Saito,et al.  The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.

[48]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[49]  Matthias Hein,et al.  Why ReLU Networks Yield High-Confidence Predictions Far Away From the Training Data and How to Mitigate the Problem , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.