Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations

We perform scalable approximate inference in a recently-proposed family of continuous-depth Bayesian neural networks. In this model class, uncertainty about separate weights in each layer produces dynamics that follow a stochastic differential equation (SDE). We demonstrate gradientbased stochastic variational inference in this infinite-parameter setting, producing arbitrarilyflexible approximate posteriors. We also derive a novel gradient estimator that approaches zero variance as the approximate posterior approaches the true posterior. This approach further inherits the memory-efficient training and tunable precision of neural ODEs.

[1]  David Duvenaud,et al.  Scalable Gradients for Stochastic Differential Equations , 2020, AISTATS.

[2]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[3]  David Duvenaud,et al.  Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference , 2017, NIPS.

[4]  Michael W. Dusenberry,et al.  Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors , 2020, ICML.

[5]  Maxim Raginsky,et al.  Neural Stochastic Differential Equations: Deep Latent Gaussian Models in the Diffusion Limit , 2019, ArXiv.

[6]  David Duvenaud,et al.  Latent ODEs for Irregularly-Sampled Time Series , 2019, ArXiv.

[7]  P. Dupuis,et al.  A variational representation for certain functionals of Brownian motion , 1998 .

[8]  Yee Whye Teh,et al.  Augmented Neural ODEs , 2019, NeurIPS.

[9]  Terry Lyons,et al.  Neural Controlled Differential Equations for Irregular Time Series , 2020, NeurIPS.

[10]  Eldad Haber,et al.  Stable architectures for deep neural networks , 2017, ArXiv.

[11]  Matthew J. Johnson,et al.  Learning Differential Equations that are Easy to Solve , 2020, NeurIPS.

[12]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[13]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[14]  Differential Bayesian Neural Nets , 2019, ArXiv.

[15]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[16]  Han Zhang,et al.  Approximation Capabilities of Neural Ordinary Differential Equations , 2019, ArXiv.

[17]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[18]  Alexandre Lacoste,et al.  Bayesian Hypernetworks , 2017, ArXiv.

[19]  Bastiaan S. Veeling,et al.  How Good is the Bayes Posterior in Deep Neural Networks Really? , 2020, ICML.

[20]  Guodong Zhang,et al.  Noisy Natural Gradient as Variational Inference , 2017, ICML.

[21]  D. Vetrov,et al.  Stochasticity in Neural ODEs: An Empirical Study , 2020, ICLR 2020.

[22]  Quoc V. Le,et al.  HyperNetworks , 2016, ICLR.

[23]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[24]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[25]  Jimeng Sun,et al.  SDE-Net: Equipping Deep Neural Networks with Uncertainty Estimates , 2020, ICML.

[26]  Hod Lipson,et al.  Principled Weight Initialization for Hypernetworks , 2020, ICLR.

[27]  S. Favaro,et al.  Infinitely deep neural networks as diffusion processes , 2019, AISTATS.

[28]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[29]  Edward De Brouwer,et al.  GRU-ODE-Bayes: Continuous modeling of sporadically-observed time series , 2019, NeurIPS.

[30]  Richard E. Turner,et al.  Practical Deep Learning with Bayesian Principles , 2019, Neural Information Processing Systems.

[31]  Stefano Peluchetti,et al.  Doubly infinite residual networks: a diffusion process approach , 2020, ArXiv.

[32]  Tengyu Ma,et al.  Fixup Initialization: Residual Learning Without Normalization , 2019, ICLR.

[33]  Cho-Jui Hsieh,et al.  Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise , 2019, ArXiv.

[34]  R. Dandekar,et al.  Bayesian Neural Ordinary Differential Equations , 2020, ArXiv.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Dan Cornford,et al.  Variational Inference for Diffusion Processes , 2007, NIPS.

[37]  M. Opper Variational Inference for Stochastic Differential Equations , 2019, Annalen der Physik.

[38]  Alan Edelman,et al.  A Differentiable Programming System to Bridge Machine Learning and Scientific Computing , 2019, ArXiv.

[39]  Max Welling,et al.  Multiplicative Normalizing Flows for Variational Bayesian Neural Networks , 2017, ICML.

[40]  Ryan P. Adams,et al.  Avoiding pathologies in very deep networks , 2014, AISTATS.

[41]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[42]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[43]  Masashi Sugiyama,et al.  Bayesian Dark Knowledge , 2015 .

[44]  Richard S. Zemel,et al.  Adversarial Distillation of Bayesian Neural Network Posteriors , 2018, ICML.

[45]  Markus Heinonen,et al.  ODE$^2$VAE: Deep generative second order ODEs with Bayesian neural networks , 2019, NeurIPS.

[46]  A. Stephen McGough,et al.  Black-Box Variational Inference for Stochastic Differential Equations , 2018, ICML.

[47]  Samuel Kaski,et al.  Deep learning with differential Gaussian process flows , 2018, AISTATS.

[48]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[49]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[50]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[51]  Aaron Mishkin,et al.  SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient , 2018, NeurIPS.

[52]  E Weinan,et al.  A Proposal on Machine Learning via Dynamical Systems , 2017, Communications in Mathematics and Statistics.

[53]  Markus Heinonen,et al.  ODE2VAE: Deep generative second order ODEs with Bayesian neural networks , 2019, NeurIPS.

[54]  Kurt Keutzer,et al.  ANODEV2: A Coupled Neural ODE Evolution Framework , 2019, ArXiv.

[55]  Maxim Raginsky,et al.  Theoretical guarantees for sampling and inference in generative models with latent diffusions , 2019, COLT.