Latent Projection BNNs: Avoiding weight-space pathologies by learning latent representations of neural network weights

As machine learning systems get widely adopted for high-stake decisions, quantifying uncertainty over predictions becomes crucial. While modern neural networks are making remarkable gains in terms of predictive accuracy, characterizing uncertainty over the parameters of these models is challenging because of the high dimensionality and complex correlations of the network parameter space. This paper introduces a novel variational inference framework for Bayesian neural networks that (1) encodes complex distributions in high-dimensional parameter space with representations in a low-dimensional latent space, and (2) performs inference efficiently on the low-dimensional representations. Across a large array of synthetic and real-world datasets, we show that our method improves uncertainty characterization and model generalization when compared with methods that work directly in the parameter space.

[1]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[2]  Lawrence Carin,et al.  Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks , 2015, AAAI.

[3]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[4]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[5]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[6]  Ben Glocker,et al.  Implicit Weight Uncertainty in Neural Networks. , 2017 .

[7]  Peter Dayan,et al.  Probabilistic Meta-Representations Of Neural Networks , 2018, ArXiv.

[8]  David Barber,et al.  An Auxiliary Variational Method , 2004, ICONIP.

[9]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[10]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[11]  Yifan Gong,et al.  Restructuring of deep neural network acoustic models with singular value decomposition , 2013, INTERSPEECH.

[12]  Andrew Gordon Wilson,et al.  Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.

[13]  Aaron Klein,et al.  Bayesian Optimization with Robust Bayesian Neural Networks , 2016, NIPS.

[14]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[15]  Xiaohui Zhang,et al.  A diversity-penalizing ensemble training method for deep learning , 2015, INTERSPEECH.

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Finale Doshi-Velez,et al.  Prediction-Constrained Topic Models for Antidepressant Recommendation , 2017, ArXiv.

[18]  Max Welling,et al.  Multiplicative Normalizing Flows for Variational Bayesian Neural Networks , 2017, ICML.

[19]  Kouichi Sakurai,et al.  One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[20]  Mohamed Zaki,et al.  Uncertainty in Neural Networks: Bayesian Ensembling , 2018, ArXiv.

[21]  Radford M. Neal Bayesian Learning via Stochastic Dynamics , 1992, NIPS.

[22]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[23]  Dustin Tran,et al.  Hierarchical Variational Models , 2015, ICML.

[24]  David M. Blei,et al.  Nonparametric variational inference , 2012, ICML.

[25]  Ebru Arisoy,et al.  Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[27]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[28]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[29]  Richard E. Turner,et al.  Black-box α-divergence minimization , 2016, ICML 2016.

[30]  Benjamin Van Roy,et al.  Ensemble Sampling , 2017, NIPS.

[31]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[32]  Max Welling,et al.  Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors , 2016, ICML.

[33]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[34]  Guodong Zhang,et al.  Functional Variational Bayesian Neural Networks , 2019, ICLR.

[35]  Lawrence Carin,et al.  Learning Structured Weight Uncertainty in Bayesian Neural Networks , 2017, AISTATS.

[36]  Andrew Gordon Wilson,et al.  Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , 2018, NeurIPS.

[37]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[38]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[39]  Finale Doshi-Velez,et al.  Semi-Supervised Prediction-Constrained Topic Models , 2018, AISTATS.

[40]  Tim Salimans,et al.  Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression , 2012, ArXiv.

[41]  Jacek M. Zurada,et al.  Deep Learning of Part-Based Representation of Data Using Sparse Autoencoders With Nonnegativity Constraints , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Ole Winther,et al.  Auxiliary Deep Generative Models , 2016, ICML.

[43]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[44]  Rohit Prabhavalkar,et al.  Compressing deep neural networks using a rank-constrained topology , 2015, INTERSPEECH.

[45]  Bo Zhang,et al.  Function Space Particle Optimization for Bayesian Neural Networks , 2019, ICLR.