论文信息 - Adversarial α-divergence Minimization for Bayesian Approximate Inference

Adversarial α-divergence Minimization for Bayesian Approximate Inference

Abstract Neural networks are state-of-the-art models for machine learning problems. They are often trained via back-propagation to find a value of the weights that correctly predicts the observed data. Back-propagation has shown good performance in many applications, however, it cannot easily output an estimate of the uncertainty in the predictions made. Estimating this uncertainty is a critical aspect with important applications. One method to obtain this information consists in following a Bayesian approach to obtain a posterior distribution of the model parameters. This posterior distribution summarizes which parameter values are compatible with the observed data. However, the posterior is often intractable and has to be approximated. Several methods have been devised for this task. Here, we propose a general method for approximate Bayesian inference that is based on minimizing α -divergences, and that allows for flexible approximate distributions. We call this method adversarial α -divergence minimization (AADM). We have evaluated AADM in the context of Bayesian neural networks. Extensive experiments show that it may lead to better results in terms of the test log-likelihood, and sometimes in terms of the squared error, in regression problems. In classification problems, however, AADM gives competitive results.

Daniel Hernández-Lobato | Simón Rodríguez Santana | D. Hernández-Lobato

[1] A. Raftery,et al. Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[2] S. Duane,et al. Hybrid Monte Carlo , 1987 .

[3] T. Gneiting,et al. The continuous ranked probability score for circular variables and its application to mesoscale forecast ensemble verification , 2006 .

[4] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[5] Dustin Tran,et al. Hierarchical Implicit Models and Likelihood-Free Variational Inference , 2017, NIPS.

[6] Ron Meir,et al. Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights , 2014, NIPS.

[7] Dilin Wang,et al. Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[8] Brendan J. Frey,et al. Sequentially Fitting "Inclusive" Trees for Inference in Noisy-OR Networks , 2000, NIPS.

[9] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[10] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[12] Francisco J. R. Ruiz,et al. Unbiased Implicit Variational Inference , 2018, AISTATS.