On Feature Collapse and Deep Kernel Learning for Single Forward Pass Uncertainty

Gaussian processes are often considered a gold standard in uncertainty estimation with low dimensional data, but they have difficulty scaling to high dimensional inputs. Deep Kernel Learning (DKL) was introduced as a solution to this problem: a deep feature extractor is used to transform the inputs over which a Gaussian process’ kernel is defined. However, DKL has been shown to provide unreliable uncertainty estimates in practice. We study why, and show that for certain feature extractors, “far-away” data points are mapped to the same features as those of training-set points. With this insight we propose to constrain DKL’s feature extractor to approximately preserve distances through a bi-Lipschitz constraint, resulting in a feature space favorable to DKL. We obtain a model, DUE, which demonstrates uncertainty quality outperforming previous DKL and single forward pass uncertainty methods, while maintaining the speed and accuracy of softmax neural networks.

[1]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[2]  Yee Whye Teh,et al.  Do Deep Generative Models Know What They Don't Know? , 2018, ICLR.

[3]  Robert P. Lieli,et al.  Estimating Conditional Average Treatment Effects , 2014 .

[4]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[5]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[6]  Dustin Tran,et al.  Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness , 2020, NeurIPS.

[7]  Carl E. Rasmussen,et al.  The Promises and Pitfalls of Deep Kernel Learning , 2021, UAI.

[8]  Arnold W. M. Smeulders,et al.  i-RevNet: Deep Invertible Networks , 2018, ICLR.

[9]  Andrew Gordon Wilson,et al.  Stochastic Variational Deep Kernel Learning , 2016, NIPS.

[10]  Max Welling,et al.  Causal Effect Inference with Deep Latent-Variable Models , 2017, NIPS 2017.

[11]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[12]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[13]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[14]  Zoubin Ghahramani,et al.  Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data , 2015, ICML.

[15]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[16]  Andrew Gordon Wilson,et al.  GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration , 2018, NeurIPS.

[17]  Carl E. Rasmussen,et al.  Manifold Gaussian Processes for regression , 2014, 2016 International Joint Conference on Neural Networks (IJCNN).

[18]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[19]  Joost R. van Amersfoort,et al.  Simple and Scalable Epistemic Uncertainty Estimation Using a Single Deep Deterministic Neural Network , 2020, ICML 2020.

[20]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[21]  Yarin Gal,et al.  BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning , 2019, NeurIPS.

[22]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[23]  Alexander G. de G. Matthews,et al.  Scalable Gaussian process inference using variational methods , 2017 .

[24]  Mark van der Wilk Sparse Gaussian process approximations and applications , 2019 .

[25]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[26]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[27]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[28]  David Duvenaud,et al.  Invertible Residual Networks , 2018, ICML.

[29]  Zoubin Ghahramani,et al.  Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks , 2017, 1707.02476.

[30]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[31]  Carl E. Rasmussen,et al.  Rates of Convergence for Sparse Variational Gaussian Process Regression , 2019, ICML.

[32]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[33]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[34]  Yao Zhang,et al.  Learning Overlapping Representations for the Estimation of Individualized Treatment Effects , 2020, AISTATS.

[35]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[36]  H. Chipman,et al.  Bayesian Additive Regression Trees , 2006 .

[37]  T. Weber,et al.  A case for new neural network smoothness constraints , 2020, ICBINB@NeurIPS.

[38]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[39]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[40]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[41]  Geoffrey E. Hinton,et al.  Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes , 2007, NIPS.

[42]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[43]  Bernhard Pfahringer,et al.  Regularisation of neural networks by enforcing Lipschitz continuity , 2018, Machine Learning.

[44]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[45]  Uri Shalit,et al.  Identifying Causal Effect Inference Failure with Uncertainty-Aware Models , 2020, NeurIPS.