Improving Deterministic Uncertainty Estimation in Deep Learning for Classification and Regression

We propose a new model that estimates uncertainty in a single forward pass and works on both classification and regression problems. Our approach combines a bi-Lipschitz feature extractor with an inducing point approximate Gaussian process, offering robust and principled uncertainty estimation. This can be seen as a refinement of Deep Kernel Learning (DKL), with our changes allowing DKL to match softmax neural networks accuracy. Our method overcomes the limitations of previous work addressing deterministic uncertainty quantification, such as the dependence of uncertainty on ad hoc hyper-parameters. Our method achieves SotA accuracy, 96.2% on CIFAR-10, while maintaining the speed of softmax models, and provides uncertainty estimates that outperform previous single forward pass uncertainty models. Finally, we demonstrate our method on a recently introduced benchmark for uncertainty in regression: treatment deferral in causal models for personalized medicine.

[1]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[2]  Yee Whye Teh,et al.  Do Deep Generative Models Know What They Don't Know? , 2018, ICLR.

[3]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[4]  Robert P. Lieli,et al.  Estimating Conditional Average Treatment Effects , 2014 .

[5]  Joost R. van Amersfoort,et al.  Simple and Scalable Epistemic Uncertainty Estimation Using a Single Deep Deterministic Neural Network , 2020, ICML 2020.

[6]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[7]  Arnold W. M. Smeulders,et al.  i-RevNet: Deep Invertible Networks , 2018, ICLR.

[8]  Andrew Gordon Wilson,et al.  Stochastic Variational Deep Kernel Learning , 2016, NIPS.

[9]  Yarin Gal,et al.  BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning , 2019, NeurIPS.

[10]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[11]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  Alexander G. de G. Matthews,et al.  Scalable Gaussian process inference using variational methods , 2017 .

[14]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[15]  Carl E. Rasmussen,et al.  Manifold Gaussian Processes for regression , 2014, 2016 International Joint Conference on Neural Networks (IJCNN).

[16]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[17]  Carl E. Rasmussen,et al.  Rates of Convergence for Sparse Variational Gaussian Process Regression , 2019, ICML.

[18]  T. Weber,et al.  A case for new neural network smoothness constraints , 2020, ICBINB@NeurIPS.

[19]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[20]  Yao Zhang,et al.  Learning Overlapping Representations for the Estimation of Individualized Treatment Effects , 2020, AISTATS.

[21]  Uri Shalit,et al.  Identifying Causal Effect Inference Failure with Uncertainty-Aware Models , 2020, NeurIPS.

[22]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[23]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[24]  David Duvenaud,et al.  Invertible Residual Networks , 2018, ICML.

[25]  Zoubin Ghahramani,et al.  Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks , 2017, 1707.02476.

[26]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[27]  Andrew Gordon Wilson,et al.  GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration , 2018, NeurIPS.

[28]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[29]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[30]  Carl E. Rasmussen,et al.  Convolutional Gaussian Processes , 2017, NIPS.

[31]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[32]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[33]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[34]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[35]  Geoffrey E. Hinton,et al.  Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes , 2007, NIPS.

[36]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[37]  Bernhard Pfahringer,et al.  Regularisation of neural networks by enforcing Lipschitz continuity , 2018, Machine Learning.

[38]  Dustin Tran,et al.  Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness , 2020, NeurIPS.

[39]  Max Welling,et al.  Causal Effect Inference with Deep Latent-Variable Models , 2017, NIPS 2017.

[40]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[41]  H. Chipman,et al.  Bayesian Additive Regression Trees , 2006 .