Learnable Bernoulli Dropout for Bayesian Deep Learning

In this work, we propose learnable Bernoulli dropout (LBD), a new model-agnostic dropout scheme that considers the dropout rates as parameters jointly optimized with other model parameters. By probabilistic modeling of Bernoulli dropout, our method enables more robust prediction and uncertainty quantification in deep models. Especially, when combined with variational auto-encoders (VAEs), LBD enables flexible semi-implicit posterior representations, leading to new semi-implicit VAE~(SIVAE) models. We solve the optimization for training with respect to the dropout parameters using Augment-REINFORCE-Merge (ARM), an unbiased and low-variance gradient estimator. Our experiments on a range of tasks show the superior performance of our approach compared with other commonly used dropout schemes. Overall, LBD leads to improved accuracy and uncertainty estimates in image classification and semantic segmentation. Moreover, using SIVAE, we can achieve state-of-the-art performance on collaborative filtering for implicit feedback on several public datasets.

[1]  Bo Zhang,et al.  Adaptive Dropout Rates for Learning with Corrupted Features , 2015, IJCAI.

[2]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[3]  Mark J. F. Gales,et al.  Predictive Uncertainty Estimation via Prior Networks , 2018, NeurIPS.

[4]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[6]  Roland Siegwart,et al.  The Fishyscapes Benchmark: Measuring Blind Spots in Semantic Segmentation , 2019, International Journal of Computer Vision.

[7]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[8]  Jishnu Mukhoti,et al.  Evaluating Bayesian Deep Learning Methods for Semantic Segmentation , 2018, ArXiv.

[9]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[10]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[11]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[12]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[13]  Alex Kendall,et al.  Concrete Dropout , 2017, NIPS.

[14]  Ivica Crnkovic,et al.  Safety for mobile robotic systems: A systematic mapping study from a software engineering perspective , 2019, J. Syst. Softw..

[15]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[16]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[17]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[18]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[19]  Zoubin Ghahramani,et al.  Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference , 2015, ArXiv.

[20]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[21]  S. Ermon,et al.  The Information-Autoencoding Family: A Lagrangian Perspective on Latent Variable Generative Modeling , 2018 .

[22]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[23]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[24]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[25]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[26]  Martin Ester,et al.  Collaborative Denoising Auto-Encoders for Top-N Recommender Systems , 2016, WSDM.

[27]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[28]  James Bennett,et al.  The Netflix Prize , 2007 .

[29]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[30]  Stefano Ermon,et al.  Accurate Uncertainties for Deep Learning Using Calibrated Regression , 2018, ICML.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[33]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[34]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[35]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[36]  Yoshua Bengio,et al.  The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[37]  Matthew D. Hoffman,et al.  Variational Autoencoders for Collaborative Filtering , 2018, WWW.

[38]  Brendan J. Frey,et al.  Adaptive dropout for training deep neural networks , 2013, NIPS.

[39]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[40]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[41]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[42]  Mingyuan Zhou,et al.  ARM: Augment-REINFORCE-Merge Gradient for Stochastic Binary Networks , 2018, ICLR.

[43]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[44]  Peter Cheeseman,et al.  Bayesian Methods for Adaptive Models , 2011 .

[45]  Zoubin Ghahramani,et al.  Variational Gaussian Dropout is not Bayesian , 2017, 1711.02989.

[46]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[47]  Mingyuan Zhou,et al.  Semi-Implicit Variational Inference , 2018, ICML.

[48]  George Karypis,et al.  SLIM: Sparse Linear Methods for Top-N Recommender Systems , 2011, 2011 IEEE 11th International Conference on Data Mining.

[49]  Bobak Mortazavi,et al.  Adaptive Activity Monitoring with Uncertainty Quantification in Switching Gaussian Process Models , 2019, AISTATS.

[50]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[51]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[52]  David Duvenaud,et al.  Backpropagation through the Void: Optimizing control variates for black-box gradient estimation , 2017, ICLR.

[53]  Aaron Klein,et al.  Bayesian Optimization with Robust Bayesian Neural Networks , 2016, NIPS.

[54]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[55]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[56]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.