Evaluating Robustness of Predictive Uncertainty Estimation: Are Dirichlet-based Models Reliable?

Robustness to adversarial perturbations and accurate uncertainty estimation are crucial for reliable application of deep learning in real world settings. Dirichlet-based uncertainty (DBU) models are a family of models that predict the parameters of a Dirichlet distribution (instead of a categorical one) and promise to signal when not to trust their predictions. Untrustworthy predictions are obtained on unknown or ambiguous samples and marked with a high uncertainty by the models. In this work, we show that DBU models with standard training are not robust w.r.t. three important tasks in the field of uncertainty estimation. In particular, we evaluate how useful the uncertainty estimates are to (1) indicate correctly classified samples, and (2) to detect adversarial examples that try to fool classification. We further evaluate the reliability of DBU models on the task of (3) distinguishing between in-distribution (ID) and out-of-distribution (OOD) data. To this end, we present the first study of certifiable robustness for DBU models. Furthermore, we propose novel uncertainty attacks that fool models into assigning high confidence to OOD data and low confidence to ID data, respectively. Based on our results, we explore the first approaches to make DBU models more robust. We use adversarial training procedures based on label attacks, uncertainty attacks, or random noise and demonstrate how they affect robustness of DBU models on ID data and OOD data.

[1]  Reachable Sets of Classifiers & Regression Models: (Non-)Robustness Analysis and Robust Training , 2020, Mach. Learn..

[2]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[3]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[4]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[5]  Tom Goldstein,et al.  Certifying Confidence via Randomized Smoothing , 2020, NeurIPS.

[6]  Tom Goldstein,et al.  Detection as Regression: Certified Object Detection by Median Smoothing , 2020, ArXiv.

[7]  Andrey Malinin,et al.  Ensemble Distribution Distillation , 2019, ICLR.

[8]  Murat Sensoy,et al.  Evidential Deep Learning to Quantify Classification Uncertainty , 2018, NeurIPS.

[9]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[10]  G. Sanguinetti,et al.  Robustness of Bayesian Neural Networks to Gradient-Based Attacks , 2020, NeurIPS.

[11]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[12]  Jinfeng Yi,et al.  Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach , 2018, ICLR.

[13]  Feng Chen,et al.  Multifaceted Uncertainty Estimation for Label-Efficient Deep Learning , 2020, NeurIPS.

[14]  Matthias Hein,et al.  Towards neural networks that provably know when they don't know , 2020, ICLR.

[15]  Andrew Gordon Wilson,et al.  A Simple Baseline for Bayesian Uncertainty in Deep Learning , 2019, NeurIPS.

[16]  David Lopez-Paz,et al.  Single-Model Uncertainties for Deep Learning , 2018, NeurIPS.

[17]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[18]  Alex Lamb,et al.  Deep Learning for Classical Japanese Literature , 2018, ArXiv.

[19]  Stephan Günnemann,et al.  Certifiable Robustness to Graph Perturbations , 2019, NeurIPS.

[20]  Stephan Günnemann,et al.  Collective Robustness Certificates: Exploiting Interdependence in Graph Neural Networks , 2023, ICLR.

[21]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[22]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[23]  Mark J. F. Gales,et al.  Prior Networks for Detection of Adversarial Attacks , 2018, ArXiv.

[24]  W. Hsu,et al.  Towards Maximizing the Representation Gap between In-Domain \& Out-of-Distribution Examples , 2020, NeurIPS.

[25]  Andrey Malinin,et al.  Reverse KL-Divergence Training of Prior Networks: Improved Uncertainty and Adversarial Robustness , 2019, NeurIPS.

[26]  Bernt Schiele,et al.  Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks , 2019, ICML.

[27]  Stephan Günnemann,et al.  Efficient Robustness Certificates for Discrete Data: Sparsity-Aware Randomized Smoothing for Graphs, Images and More , 2020, ICML.

[28]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[29]  Yarin Gal,et al.  Understanding Measures of Uncertainty for Adversarial Example Detection , 2018, UAI.

[30]  Stephan Günnemann,et al.  Adversarial Attacks on Neural Networks for Graph Data , 2018, KDD.

[31]  Richard E. Turner,et al.  Practical Deep Learning with Bayesian Principles , 2019, Neural Information Processing Systems.

[32]  Shu Hu,et al.  Uncertainty Aware Semi-Supervised Learning on Graph Data , 2020, NeurIPS.

[33]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[34]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[35]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[36]  M. Kwiatkowska,et al.  Probabilistic Safety for Bayesian Neural Networks , 2020, UAI.

[37]  Stephan Günnemann,et al.  Posterior Network: Uncertainty Estimation without OOD Samples via Density-Based Pseudo-Counts , 2020, NeurIPS.

[38]  Martin Vechev,et al.  Beyond the Single Neuron Convex Barrier for Neural Network Certification , 2019, NeurIPS.

[39]  Takaya Saito,et al.  The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.

[40]  Martin Vechev,et al.  Adversarial Attacks on Probabilistic Autoregressive Forecasting Models , 2020, ICML.

[41]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[42]  Mark J. F. Gales,et al.  Predictive Uncertainty Estimation via Prior Networks , 2018, NeurIPS.

[43]  Luca Cardelli,et al.  Statistical Guarantees for the Robustness of Bayesian Neural Networks , 2019, IJCAI.

[44]  Ed H. Chi,et al.  Improving Uncertainty Estimates through the Relationship with Adversarial Robustness , 2020, ArXiv.

[45]  Yang Song,et al.  Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Murat Sensoy,et al.  Uncertainty-Aware Deep Classifiers Using Generative Models , 2020, AAAI.

[47]  Matthias Hein,et al.  Provable Worst Case Guarantees for the Detection of Out-of-Distribution Data , 2020, ArXiv.

[48]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[49]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[50]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[51]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[52]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[53]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[54]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[55]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[56]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[57]  Jinfeng Yi,et al.  Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples , 2018, AAAI.