论文信息 - Ensemble learning in Bayesian neural networks

Ensemble learning in Bayesian neural networks

Bayesian treatments of learning in neural networks are typically based either on a local Gaussian approximation to a mode of the posterior weight distribution, or on Markov chain Monte Carlo simulations. A third approach, called ensemble learning, was introduced by Hinton and van Camp (1993). It aims to approximate the posterior distribution by minimizing the Kullback-Leibler divergence between the true posterior and a parametric approximating distribution. The original derivation of a deterministic algorithm relied on the use of a Gaussian approximating distribution with a diagonal covariance matrix and hence was unable to capture the posterior correlations between parameters. In this chapter we show how the ensemble learning approach can be extended to full-covariance Gaussian distributions while remaining computationally tractable. We also extend the framework to deal with hyperparameters, leading to a simple re-estimation procedure. One of the benefits of our approach is that it yields a strict lower bound on the marginal likelihood, in contrast to other approximate procedures.

Charles M. Bishop | D. Barber

[1] A. M. Walker. On the Asymptotic Behaviour of Posterior Distributions , 1969 .

[2] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3] James O. Berger,et al. Statistical Decision Theory and Bayesian Analysis, Second Edition , 1985 .

[4] A. Kennedy,et al. Hybrid Monte Carlo , 1987 .

[5] J. Berger. Statistical Decision Theory and Bayesian Analysis , 1988 .

[6] Chris Bishop,et al. Current address: Microsoft Research, , 2022 .

[7] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[8] Geoffrey E. Hinton,et al. Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[9] Radford M. Neal. A new view of the EM algorithm that justifies incremental and other variants , 1993 .

[10] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[11] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.