论文信息 - Wide Mean-Field Variational Bayesian Neural Networks Ignore the Data

Wide Mean-Field Variational Bayesian Neural Networks Ignore the Data

Variational inference enables approximate posterior inference of the highly over-parameterized neural networks that are popular in modern machine learning. Unfortunately, such posteriors are known to exhibit various pathological behaviors. We prove that as the number of hidden units in a single-layer Bayesian neural network tends to infinity, the function-space posterior mean under mean-field variational inference actually converges to zero, completely ignoring the data. This is in contrast to the true posterior, which converges to a Gaussian process. Our work provides insight into the over-regularization of the KL divergence in variational inference.

[1] Sebastian Nowozin,et al. How Good is the Bayes Posterior in Deep Neural Networks Really? , 2020, ICML.

[2] George Pólya,et al. Remarks on Computing the Probability Integral in One and Two Dimensions , 1949 .

[3] Jasper Snoek,et al. Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors , 2020, ICML.

[4] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[5] Richard E. Turner,et al. Overpruning in Variational Bayesian Neural Networks , 2018, 1801.06230.

[6] Geoffrey E. Hinton,et al. Bayesian Learning for Neural Networks , 1995 .

[7] Jaehoon Lee,et al. Neural Tangents: Fast and Easy Infinite Neural Networks in Python , 2019, ICLR.

[8] Richard E. Turner,et al. 'In-Between' Uncertainty in Bayesian Neural Networks , 2019, ArXiv.

[9] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[10] Jascha Sohl-Dickstein,et al. Exact posterior distributions of wide Bayesian neural networks , 2020, ArXiv.

[11] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.

[12] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[13] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.

[14] Richard E. Turner,et al. Gaussian Process Behaviour in Wide Deep Neural Networks , 2018, ICLR.