论文信息 - On Feature Collapse and Deep Kernel Learning for Single Forward Pass Uncertainty - 字舞流文

On Feature Collapse and Deep Kernel Learning for Single Forward Pass Uncertainty

Gaussian processes are often considered a gold standard in uncertainty estimation with low dimensional data, but they have difficulty scaling to high dimensional inputs. Deep Kernel Learning (DKL) was introduced as a solution to this problem: a deep feature extractor is used to transform the inputs over which a Gaussian process’ kernel is defined. However, DKL has been shown to provide unreliable uncertainty estimates in practice. We study why, and show that for certain feature extractors, “far-away” data points are mapped to the same features as those of training-set points. With this insight we propose to constrain DKL’s feature extractor to approximately preserve distances through a bi-Lipschitz constraint, resulting in a feature space favorable to DKL. We obtain a model, DUE, which demonstrates uncertainty quality outperforming previous DKL and single forward pass uncertainty methods, while maintaining the speed and accuracy of softmax neural networks.

Yarin Gal | Lewis Smith | Joost van Amersfoort | Andrew Jesson | Oscar Key

[1] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[2] Yee Whye Teh,et al. Do Deep Generative Models Know What They Don't Know? , 2018, ICLR.

[3] Robert P. Lieli,et al. Estimating Conditional Average Treatment Effects , 2014 .

[4] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[5] Uri Shalit,et al. Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[6] Dustin Tran,et al. Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness , 2020, NeurIPS.

[7] Carl E. Rasmussen,et al. The Promises and Pitfalls of Deep Kernel Learning , 2021, UAI.

[8] Arnold W. M. Smeulders,et al. i-RevNet: Deep Invertible Networks , 2018, ICLR.

[9] Andrew Gordon Wilson,et al. Stochastic Variational Deep Kernel Learning , 2016, NIPS.

[10] Max Welling,et al. Causal Effect Inference with Deep Latent-Variable Models , 2017, NIPS 2017.

[11] Andrew Gordon Wilson,et al. Deep Kernel Learning , 2015, AISTATS.

[12] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[13] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[14] Zoubin Ghahramani,et al. Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data , 2015, ICML.

[15] Yarin Gal,et al. Uncertainty in Deep Learning , 2016 .

[16] Andrew Gordon Wilson,et al. GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration , 2018, NeurIPS.

[17] Carl E. Rasmussen,et al. Manifold Gaussian Processes for regression , 2014, 2016 International Joint Conference on Neural Networks (IJCNN).

[18] James Hensman,et al. Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[19] Joost R. van Amersfoort,et al. Simple and Scalable Epistemic Uncertainty Estimation Using a Single Deep Deterministic Neural Network , 2020, ICML 2020.

[20] Aaron C. Courville,et al. Improved Training of Wasserstein GANs , 2017, NIPS.

[21] Yarin Gal,et al. BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning , 2019, NeurIPS.

[22] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[23] Alexander G. de G. Matthews,et al. Scalable Gaussian process inference using variational methods , 2017 .

[24] Mark van der Wilk. Sparse Gaussian process approximations and applications , 2019 .

[25] Sebastian Nowozin,et al. Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[26] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[27] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[28] David Duvenaud,et al. Invertible Residual Networks , 2018, ICML.

[29] Zoubin Ghahramani,et al. Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks , 2017, 1707.02476.

[30] Jennifer L. Hill,et al. Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[31] Carl E. Rasmussen,et al. Rates of Convergence for Sparse Variational Gaussian Process Regression , 2019, ICML.

[32] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[33] Michalis K. Titsias,et al. Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[34] Yao Zhang,et al. Learning Overlapping Representations for the Estimation of Individualized Treatment Effects , 2020, AISTATS.

[35] H. Chipman,et al. BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[36] H. Chipman,et al. Bayesian Additive Regression Trees , 2006 .

[37] T. Weber,et al. A case for new neural network smoothness constraints , 2020, ICBINB@NeurIPS.

[38] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[39] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[40] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[41] Geoffrey E. Hinton,et al. Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes , 2007, NIPS.

[42] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[43] Bernhard Pfahringer,et al. Regularisation of neural networks by enforcing Lipschitz continuity , 2018, Machine Learning.

[44] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[45] Uri Shalit,et al. Identifying Causal Effect Inference Failure with Uncertainty-Aware Models , 2020, NeurIPS.