Hands-On Bayesian Neural Networks—A Tutorial for Deep Learning Users

Modern deep learning methods constitute incredibly powerful tools to tackle a myriad of challenging problems. However, since deep learning methods operate as black boxes, the uncertainty associated with their predictions is often challenging to quantify. Bayesian statistics offer a formalism to understand and quantify the uncertainty associated with deep neural network predictions. This tutorial provides deep learning practitioners with an overview of the relevant literature and a complete toolset to design, implement, train, use and evaluate Bayesian neural networks, i.e., stochastic artificial neural networks trained using Bayesian methods.

[1]  Mohammed Bennamoun,et al.  A Survey on Deep Learning Techniques for Stereo-Based Depth Estimation , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jiebo Luo,et al.  Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Andrew Gordon Wilson,et al.  What Are Bayesian Neural Network Posteriors Really Like? , 2021, ICML.

[4]  Wai Keen Vong,et al.  Mitigating belief projection in explainable artificial intelligence via Bayesian teaching , 2021, Scientific Reports.

[5]  Timothy M. Hospedales,et al.  Meta-Learning in Neural Networks: A Survey , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Mohammed Bennamoun,et al.  Image-Based 3D Object Reconstruction: State-of-the-Art and Trends in the Deep Learning Era , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Yingli Tian,et al.  Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Linda R Petzold,et al.  General Bayesian Inference over the Stiefel Manifold via the Givens Representation , 2017 .

[9]  Sara Szymkuć,et al.  Minimal-uncertainty prediction of general drug-likeness based on Bayesian neural networks , 2020, Nature Machine Intelligence.

[10]  Mihaela van der Schaar,et al.  Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift , 2020, ICML.

[11]  Clinton Fookes,et al.  Bayesian Neural Networks: An Introduction and Survey , 2020, Case Studies in Applied Bayesian Data Science.

[12]  Ankit Singh Rawat,et al.  Why distillation helps: a statistical perspective , 2020, ArXiv.

[13]  Daniele Silvestro,et al.  Prior choice affects ability of Bayesian neural networks to identify unknowns , 2020, ArXiv.

[14]  Shafiq R. Joty,et al.  MultiMix: A Robust Data Augmentation Strategy for Cross-Lingual NLP , 2020, ArXiv.

[15]  Sankaran Mahadevan,et al.  Bayesian neural networks for flight trajectory prediction and safety assessment , 2020, Decis. Support Syst..

[16]  Xipeng Qiu,et al.  Pre-trained models for natural language processing: A survey , 2020, Science China Technological Sciences.

[17]  Philipp Hennig,et al.  Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks , 2020, ICML.

[18]  Pavel Izmailov,et al.  Bayesian Deep Learning and a Probabilistic Perspective of Generalization , 2020, NeurIPS.

[19]  Sylvain Gelly,et al.  On Last-Layer Algorithms for Classification: Decoupling Representation from Uncertainty Estimation , 2020, ArXiv.

[20]  David Berthelot,et al.  FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence , 2020, NeurIPS.

[21]  V. Gómez,et al.  Input complexity and out-of-distribution detection with likelihood-based generative models , 2019, ICLR.

[22]  Mohamed H. Zaki,et al.  Uncertainty in Neural Networks: Approximately Bayesian Ensembling , 2018, AISTATS.

[23]  Hao Wang,et al.  A Survey on Bayesian Deep Learning , 2016, ACM Comput. Surv..

[24]  Franco Bontempi,et al.  Structural health monitoring of a cable-stayed bridge with Bayesian neural networks , 2015, Design, Assessment, Monitoring and Maintenance of Bridges and Infrastructure Networks.

[25]  Brian Mac Namee,et al.  On the Validity of Bayesian Neural Networks for Uncertainty Estimation , 2019, AICS.

[26]  Towards calibrated and scalable uncertainty representations for neural networks , 2019, ArXiv.

[27]  Diederik P. Kingma,et al.  An Introduction to Variational Autoencoders , 2019, Found. Trends Mach. Learn..

[28]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[29]  Richard E. Turner,et al.  Practical Deep Learning with Bayesian Principles , 2019, Neural Information Processing Systems.

[30]  Daniel Angerhausen,et al.  An Ensemble of Bayesian Neural Networks for Exoplanetary Atmospheric Retrieval , 2019, The Astronomical Journal.

[31]  Alexander Kolesnikov,et al.  S4L: Self-Supervised Semi-Supervised Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Quoc V. Le,et al.  Unsupervised Data Augmentation , 2019, ArXiv.

[33]  Gustavo Carneiro,et al.  Bayesian Generative Active Deep Learning , 2019, ICML.

[34]  Jeremy Nixon,et al.  Measuring Calibration in Deep Learning , 2019, CVPR Workshops.

[35]  Andrew Gordon Wilson,et al.  A Simple Baseline for Bayesian Uncertainty in Deep Learning , 2019, NeurIPS.

[36]  ByungSoo Ko,et al.  Naive semi-supervised deep learning using pseudo-label , 2018, Peer-to-Peer Networking and Applications.

[37]  Noah D. Goodman,et al.  Pyro: Deep Universal Probabilistic Programming , 2018, J. Mach. Learn. Res..

[38]  Yee Whye Teh,et al.  Do Deep Generative Models Know What They Don't Know? , 2018, ICLR.

[39]  Jose M. Alvarez,et al.  The Relevance of Bayesian Layer Positioning to Model Uncertainty in Deep Bayesian Active Learning , 2018, ArXiv.

[40]  Didrik Nielsen,et al.  Fast yet Simple Natural-Gradient Descent for Variational Inference in Complex Models , 2018, 2018 International Symposium on Information Theory and Its Applications (ISITA).

[41]  Zoubin Ghahramani,et al.  Variational Bayesian dropout: pitfalls and fixes , 2018, ICML.

[42]  Stefano Ermon,et al.  Accurate Uncertainties for Deep Learning Using Calibrated Regression , 2018, ICML.

[43]  Richard S. Zemel,et al.  Adversarial Distillation of Bayesian Neural Network Posteriors , 2018, ICML.

[44]  Didrik Nielsen,et al.  Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam , 2018, ICML.

[45]  Jelena Frtunikj,et al.  Deep Learning for Self-Driving Cars: Chances and Challenges , 2018, 2018 IEEE/ACM 1st International Workshop on Software Engineering for AI in Autonomous Systems (SEFAIAS).

[46]  David Barber,et al.  Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting , 2018, NeurIPS.

[47]  David Barber,et al.  A Scalable Laplace Approximation for Neural Networks , 2018, ICLR.

[48]  Dustin Tran,et al.  Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches , 2018, ICLR.

[49]  F. Dablander,et al.  How to become a Bayesian in eight easy steps: An annotated reading list , 2018, Psychonomic bulletin & review.

[50]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[51]  Finale Doshi-Velez,et al.  Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive Learning , 2017, ICML.

[52]  Saeid Nahavandi,et al.  Neural Network-Based Uncertainty Quantification: A Survey of Methodologies and Applications , 2018, IEEE Access.

[53]  Lipo Wang,et al.  Deep Learning Applications in Medical Image Analysis , 2018, IEEE Access.

[54]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[55]  Vadim Sokolov,et al.  Deep Learning: A Bayesian Perspective , 2017, ArXiv.

[56]  David M. Blei,et al.  Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..

[57]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[58]  Yarin Gal,et al.  Dropout Inference in Bayesian Neural Networks with Alpha-divergences , 2017, ICML.

[59]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[60]  Wojciech Czarnecki,et al.  On Loss Functions for Deep Neural Networks in Classification , 2017, ArXiv.

[61]  Dustin Tran,et al.  Deep Probabilistic Programming , 2017, ICLR.

[62]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[63]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[64]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[65]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[66]  Arnaud Doucet,et al.  On Markov chain Monte Carlo methods for tall data , 2015, J. Mach. Learn. Res..

[67]  John Salvatier,et al.  PyMC3: Python probabilistic programming framework , 2016 .

[68]  Dit-Yan Yeung,et al.  Towards Bayesian Deep Learning: A Framework and Some Existing Methods , 2016, IEEE Transactions on Knowledge and Data Engineering.

[69]  Adriano Lorena Inácio de Oliveira,et al.  Expert Systems With Applications , 2022 .

[70]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[71]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[72]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[73]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[74]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[75]  Max Welling,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS 2015.

[76]  Zoubin Ghahramani,et al.  Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference , 2015, ArXiv.

[77]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[78]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[79]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[80]  Masashi Sugiyama,et al.  Bayesian Dark Knowledge , 2015 .

[81]  Umapada Pal,et al.  Semi-supervised Online Bayesian Network Learner for Handwritten Characters Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[82]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[83]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[84]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[85]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[86]  Diederik P. Kingma,et al.  Stochastic Gradient VB and the Variational Auto-Encoder , 2013 .

[87]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[88]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[89]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[90]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[91]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[92]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[93]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[94]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[95]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[96]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[97]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[98]  A. Kiureghian,et al.  Aleatory or epistemic? Does it matter? , 2009 .

[99]  Faming Liang,et al.  Estimating uncertainty of streamflow simulation using Bayesian neural networks , 2009 .

[100]  R. Bharat Rao,et al.  Bayesian Co-Training , 2007, J. Mach. Learn. Res..

[101]  Christian P. Robert,et al.  The Bayesian choice : from decision-theoretic foundations to computational implementation , 2007 .

[102]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[103]  Dong-Sheng Jeng,et al.  Bayesian neural networks for prediction of equilibrium and time-dependent scour depth around bridge piers , 2007, Adv. Eng. Softw..

[104]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[105]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[106]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[107]  Zoubin Ghahramani,et al.  Compact approximations to Bayesian predictive distributions , 2005, ICML.

[108]  D. M. Titterington,et al.  Bayesian Methods for Neural Networks and Related Models , 2004 .

[109]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[110]  Adrian Corduneanu,et al.  On Information Regularization , 2002, UAI.

[111]  Jouko Lampinen,et al.  Bayesian approach for neural networks--review and case studies , 2001, Neural Networks.

[112]  Farzan Aminian,et al.  Fault Diagnosis of Analog Circuits Using Bayesian Neural Networks with Wavelet Transform as Preprocessor , 2001, J. Electron. Test..

[113]  Jason Weston,et al.  Vicinal Risk Minimization , 2000, NIPS.

[114]  Zoubin Ghahramani,et al.  Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[115]  Manfred Opper,et al.  A Bayesian approach to on-line learning , 1999 .

[116]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[117]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[118]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[119]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[120]  Wray L. Buntine Operations for Learning with Graphical Models , 1994, J. Artif. Intell. Res..

[121]  G. Casella,et al.  Explaining the Gibbs Sampler , 1992 .

[122]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[123]  C. N. Morris,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[124]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[125]  F. Galton Vox Populi , 1907, Nature.