Estimating Model Uncertainty of Neural Networks in Sparse Information Form

We present a sparse representation of model uncertainty for Deep Neural Networks (DNNs) where the parameter posterior is approximated with an inverse formulation of the Multivariate Normal Distribution (MND), also known as the information form. The key insight of our work is that the information matrix, i.e. the inverse of the covariance matrix tends to be sparse in its spectrum. Therefore, dimensionality reduction techniques such as low rank approximations (LRA) can be effectively exploited. To achieve this, we develop a novel sparsification algorithm and derive a cost-effective analytical sampler. As a result, we show that the information form can be scalably applied to represent model uncertainty in DNNs. Our exhaustive theoretical analysis and empirical evaluations on various benchmarks show the competitiveness of our approach over the current methods.

[1]  Pascal Vincent,et al.  Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis , 2018, NeurIPS.

[2]  Roger B. Grosse,et al.  Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.

[3]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[4]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[5]  Frederik Kunstner,et al.  Limitations of the empirical Fisher approximation for natural gradient descent , 2019, NeurIPS.

[6]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[7]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[8]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[9]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[10]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[11]  David Barber,et al.  Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting , 2018, NeurIPS.

[12]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[13]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[14]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[15]  Rudolph Triebel,et al.  Non-Parametric Calibration for Classification , 2019, AISTATS.

[16]  Yann Dauphin,et al.  Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.

[17]  Daniel Cremers,et al.  Active online confidence boosting for efficient object classification , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Aaron Mishkin,et al.  SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient , 2018, NeurIPS.

[19]  Sebastian Thrun,et al.  Multi-robot SLAM with Sparse Extended Information Filers , 2003, ISRR.

[20]  Lawrence Carin,et al.  Learning Structured Weight Uncertainty in Bayesian Neural Networks , 2017, AISTATS.

[21]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[22]  Agustinus Kristiadi,et al.  Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks , 2020, ICML.

[23]  Rudolph Triebel,et al.  Introspective classification for robot perception , 2016, Int. J. Robotics Res..

[24]  Richard E. Turner,et al.  'In-Between' Uncertainty in Bayesian Neural Networks , 2019, ArXiv.

[25]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[26]  Hugh Durrant-Whyte,et al.  Simultaneous localization and mapping (SLAM): part II , 2006 .

[27]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[28]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[29]  Mirko Meboldt,et al.  Standardized Comparison of Selected Physiological Controllers for Rotary Blood Pumps: In Vitro Study , 2018, Artificial organs.

[30]  Jishnu Mukhoti,et al.  On the Importance of Strong Baselines in Bayesian Deep Learning , 2018, ArXiv.

[31]  Maximilian Durner,et al.  Introspective Robot Perception using Smoothed Predictions from Bayesian Neural Networks , 2021, ISRR.

[32]  Roger B. Grosse,et al.  Distributed Second-Order Optimization using Kronecker-Factored Approximations , 2016, ICLR.

[33]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[34]  A. G. Wilson,et al.  Fast Uncertainty Estimates and Bayesian Model Averaging of DNNs , 2018 .

[35]  Konstantin Kondak,et al.  Visual-Inertial Telepresence for Aerial Manipulation , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Sebastian Nowozin,et al.  Deterministic Variational Inference for Robust Bayesian Neural Networks , 2018, ICLR.

[37]  Andrew Gordon Wilson,et al.  Subspace Inference for Bayesian Deep Learning , 2019, UAI.

[38]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[39]  Gunhee Kim,et al.  Variational Laplace Autoencoders , 2019, ICML.

[40]  Michael O'Neil,et al.  Fast symmetric factorization of hierarchical matrices with applications , 2014, ArXiv.

[41]  Frederik Kunstner,et al.  BackPACK: Packing more into backprop , 2020, ICLR.

[42]  Andrew Gordon Wilson,et al.  A Simple Baseline for Bayesian Uncertainty in Deep Learning , 2019, NeurIPS.

[43]  Konstantin Kondak,et al.  Towards Autonomous Stratospheric Flight: A Generic Global System Identification Framework for Fixed-Wing Platforms , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[44]  Hugh F. Durrant-Whyte,et al.  Simultaneous Localization and Mapping with Sparse Extended Information Filters , 2004, Int. J. Robotics Res..

[45]  N. Higham COMPUTING A NEAREST SYMMETRIC POSITIVE SEMIDEFINITE MATRIX , 1988 .

[46]  Mirko Meboldt,et al.  Comparison of Flow Estimators for Rotary Blood Pumps: An In Vitro and In Vivo Study , 2018, Annals of Biomedical Engineering.

[47]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[48]  Guodong Zhang,et al.  Noisy Natural Gradient as Variational Inference , 2017, ICML.

[49]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[50]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[51]  Rudolph Triebel,et al.  Knowing when we don't know: Introspective classification for mission-critical decision making , 2013, 2013 IEEE International Conference on Robotics and Automation.

[52]  Neeraj Pradhan,et al.  Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro , 2019, ArXiv.

[53]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[54]  Max Welling,et al.  Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors , 2016, ICML.

[55]  Sebastian Nowozin,et al.  How Good is the Bayes Posterior in Deep Neural Networks Really? , 2020, ICML.

[56]  Hanumant Singh,et al.  Exactly Sparse Delayed-State Filters for View-Based SLAM , 2006, IEEE Transactions on Robotics.

[57]  Mark A. Paskin,et al.  Thin Junction Tree Filters for Simultaneous Localization and Mapping , 2002, IJCAI.

[58]  David Barber,et al.  Practical Gauss-Newton Optimisation for Deep Learning , 2017, ICML.

[59]  David Barber,et al.  A Scalable Laplace Approximation for Neural Networks , 2018, ICLR.

[60]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[61]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[62]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[63]  Edward Y. Chang,et al.  BDA-PCH: Block-Diagonal Approximation of Positive-Curvature Hessian for Training Neural Networks , 2018, ArXiv.

[64]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.