论文信息 - Uncertainty Estimation with Infinitesimal Jackknife, Its Distribution and Mean-Field Approximation - 字舞流文

Uncertainty Estimation with Infinitesimal Jackknife, Its Distribution and Mean-Field Approximation

Uncertainty quantification is an important research area in machine learning. Many approaches have been developed to improve the representation of uncertainty in deep models to avoid overconfident predictions. Existing ones such as Bayesian neural networks and ensemble methods require modifications to the training procedures and are computationally costly for both training and inference. Motivated by this, we propose mean-field infinitesimal jackknife (mfIJ) -- a simple, efficient, and general-purpose plug-in estimator for uncertainty estimation. The main idea is to use infinitesimal jackknife, a classical tool from statistics for uncertainty estimation to construct a pseudo-ensemble that can be described with a closed-form Gaussian distribution, without retraining. We then use this Gaussian distribution for uncertainty estimation. While the standard way is to sample models from this distribution and combine each sample's prediction, we develop a mean-field approximation to the inference where Gaussian random variables need to be integrated with the softmax nonlinear functions to generate probabilities for multinomial variables. The approach has many appealing properties: it functions as an ensemble without requiring multiple models, and it enables closed-form approximate inference using only the first and second moments of Gaussians. Empirically, mfIJ performs competitively when compared to state-of-the-art methods, including deep ensembles, temperature scaling, dropout and Bayesian NNs, on important uncertainty tasks. It especially outperforms many methods on out-of-distribution detection.

Fei Sha | Zhiyun Lu | Eugene Ie | Fei Sha | Eugene Ie | Zhiyun Lu

[1] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .

[2] Yinda Zhang,et al. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[3] Dustin Tran,et al. Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches , 2018, ICLR.

[4] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.

[5] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[6] Naman Agarwal,et al. Second Order Stochastic Optimization in Linear Time , 2016, ArXiv.

[7] Suchi Saria,et al. Can You Trust This Prediction? Auditing Pointwise Reliability After Learning , 2019, AISTATS.

[8] Aaron Mishkin,et al. SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient , 2018, NeurIPS.

[9] B. Efron,et al. The Jackknife Estimate of Variance , 1981 .

[10] THE DISCRIMINATIVE JACKKNIFE: QUANTIFYING PREDICTIVE UNCERTAINTY VIA HIGHER-ORDER INFLUENCE FUNCTIONS , 2019 .

[11] Jean Daunizeau,et al. Semi-analytical approximations to statistical moments of sigmoid and softmax mappings of normal variables , 2017, 1703.00091.

[12] David Hinkley,et al. Bootstrap Methods: Another Look at the Jackknife , 2008 .

[13] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[14] Andrew Gordon Wilson,et al. A Simple Baseline for Bayesian Uncertainty in Deep Learning , 2019, NeurIPS.

[15] Kibok Lee,et al. A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[16] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[17] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[18] S. Weisberg,et al. Residuals and Influence in Regression , 1982 .

[19] Jose M. Alvarez,et al. The Relevance of Bayesian Layer Positioning to Model Uncertainty in Deep Bayesian Active Learning , 2018, ArXiv.

[20] Tianqi Chen,et al. Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[21] Xin T. Tong,et al. Statistical inference for model parameters in stochastic gradient descent , 2016, The Annals of Statistics.

[22] Rupert G. Miller. The jackknife-a review , 1974 .

[23] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[24] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.

[25] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[26] Ryan J. Tibshirani,et al. Predictive inference with the jackknife+ , 2019, The Annals of Statistics.

[27] Percy Liang,et al. Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[28] Peter Cheeseman,et al. Bayesian Methods for Adaptive Models , 2011 .

[29] David Madras,et al. Detecting Extrapolation with Local Ensembles , 2020, ICLR.

[30] Dmitry Vetrov,et al. Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning , 2020, ICLR.

[31] Michael I. Jordan,et al. A Higher-Order Swiss Army Infinitesimal Jackknife , 2019, ArXiv.

[32] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.

[33] David Barber,et al. A Scalable Laplace Approximation for Neural Networks , 2018, ICLR.

[34] Kevin Gimpel,et al. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[35] David M. Blei,et al. Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..

[36] Andrew Gordon Wilson,et al. Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.

[37] Sebastian Nowozin,et al. How Good is the Bayes Posterior in Deep Neural Networks Really? , 2020, ICML.

[38] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[39] Michael I. Jordan,et al. A Swiss Army Infinitesimal Jackknife , 2018, AISTATS.

[40] Agustinus Kristiadi,et al. Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks , 2020, ICML.

[41] Sebastian Nowozin,et al. Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[42] David Barber,et al. Ensemble Learning for Multi-Layer Networks , 1997, NIPS.

[43] Ryan P. Adams,et al. Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[44] Dawn Song,et al. Natural Adversarial Examples , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45] R. Srikant,et al. Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.