论文信息 - Concrete Dropout

Concrete Dropout

Dropout is used as a practical tool to obtain uncertainty estimates in large vision models and reinforcement learning (RL) tasks. But to obtain well-calibrated uncertainty estimates, a grid-search over the dropout probabilities is necessary—a prohibitive operation with large models, and an impossible one with RL. We propose a new dropout variant which gives improved performance and better calibrated uncertainties. Relying on recent developments in Bayesian deep learning, we use a continuous relaxation of dropout’s discrete masks. Together with a principled optimisation objective, this allows for automatic tuning of the dropout probability in large models, and as a result faster experimentation cycles. In RL this allows the agent to adapt its uncertainty dynamically as more data is observed. We analyse the proposed variant extensively on a range of tasks, and give insights into common practice in the field where larger dropout probabilities are often used in deeper model layers.

[1] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.

[2] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .

[3] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.

[4] M. Dmitry,et al. Dropout-based Automatic Relevance Determination , 2016, NIPS 2016.

[5] Michael C. Fu,et al. Chapter 19 Gradient Estimation , 2006, Simulation.

[6] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7] Yarin Gal,et al. Uncertainty in Deep Learning , 2016 .

[8] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Michael I. Jordan,et al. Variational Bayesian Inference with Stochastic Search , 2012, ICML.

[10] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[11] Yoshua Bengio,et al. The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12] Yarin Gal,et al. Dropout Inference in Bayesian Neural Networks with Alpha-divergences , 2017, ICML.

[13] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[14] Brendan J. Frey,et al. Adaptive dropout for training deep neural networks , 2013, NIPS.

[15] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[16] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[17] Ryan P. Adams,et al. Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[18] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[19] Miguel Lázaro-Gredilla,et al. Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[20] Matthew J. Beal,et al. The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures , 2003 .

[21] Sergey Levine,et al. Uncertainty-Aware Reinforcement Learning for Collision Avoidance , 2017, ArXiv.

[22] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[23] Roberto Cipolla,et al. Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[24] C. Rasmussen,et al. Improving PILCO with Bayesian Neural Network Dynamics Models , 2016 .

[25] Michael Kampffmeyer,et al. Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[26] Alex Kendall,et al. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[27] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[28] Zoubin Ghahramani,et al. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[29] Daniel Hernández-Lobato,et al. Deep Gaussian Processes for Regression using Approximate Expectation Propagation , 2016, ICML.

[30] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.