论文信息 - On Modern Deep Learning and Variational Inference

On Modern Deep Learning and Variational Inference

Bayesian modelling and variational inference are rooted in Bayesian statistics, and easily benefit from the vast literature in the field. In contrast, deep learning lacks a solid mathematical grounding. Instead, empirical developments in deep learning are often justified by metaphors, evading the unexplained principles at play. It is perhaps astonishing then that most modern deep learning models can be cast as performing approximate variational inference in a Bayesian setting. This mathematically grounded result, studied in Gal and Ghahramani [1] for deep neural networks (NNs), is extended here to arbitrary deep learning models. The implications of this statement are profound: we can use the rich Bayesian statistics literature with deep learning models, explain away many of the curiosities with these, combine results from deep learning into Bayesian modelling, and much more. We demonstrate the practical impact of the framework with image classification by combining Bayesian and deep learning techniques, obtaining new state-of-the-art results, and survey open problems to research. These stand at the forefront of a new and exciting field combining modern deep learning and Bayesian techniques.

Zoubin Ghahramani | Yarin Gal

[1] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.

[2] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[3] Zoubin Ghahramani,et al. Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference , 2015, ArXiv.

[4] Alex Graves,et al. Practical Variational Inference for Neural Networks , 2011, NIPS.

[5] Charles M. Bishop,et al. Ensemble learning in Bayesian neural networks , 1998 .

[6] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[7] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[8] Pierre Baldi,et al. Understanding Dropout , 2013, NIPS.

[9] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[10] Sida I. Wang,et al. Dropout Training as Adaptive Regularization , 2013, NIPS.

[11] Yg,et al. Dropout as a Bayesian Approximation : Insights and Applications , 2015 .