Scalable Gradients for Stochastic Differential Equations

The adjoint sensitivity method scalably computes gradients of solutions to ordinary differential equations. We generalize this method to stochastic differential equations, allowing time-efficient and constant-memory computation of gradients with high-order adaptive solvers. Specifically, we derive a stochastic differential equation whose solution is the gradient, a memory-efficient algorithm for caching noise, and conditions under which numerical solutions converge. In addition, we combine our method with gradient-based stochastic variational inference for latent stochastic differential equations. We use our method to fit stochastic dynamics defined by neural networks, achieving competitive performance on a 50-dimensional motion capture dataset.

[1]  Maxim Raginsky,et al.  Theoretical guarantees for sampling and inference in generative models with latent diffusions , 2019, COLT.

[2]  Markus Heinonen,et al.  ODE2VAE: Deep generative second order ODEs with Bayesian neural networks , 2019, NeurIPS.

[3]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[4]  K. Burrage,et al.  Adaptive stepsize based on control theory for stochastic differential equations , 2004 .

[5]  Eldad Haber,et al.  Reversible Architectures for Arbitrarily Deep Residual Neural Networks , 2017, AAAI.

[6]  Werner Römisch,et al.  Numerical Solution of Stochastic Differential Equations (Peter E. Kloeden and Eckhard Platen) , 1995, SIAM Rev..

[7]  Barak A. Pearlmutter Gradient calculations for dynamic recurrent neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[8]  A. Stephen McGough,et al.  Black-Box Variational Inference for Stochastic Differential Equations , 2018, ICML.

[9]  Frederick Tung,et al.  Multi-level Residual Networks from Dynamical Systems View , 2017, ICLR.

[10]  Long Chen,et al.  Maximum Principle Based Algorithms for Deep Learning , 2017, J. Mach. Learn. Res..

[11]  P. Glasserman,et al.  Smoking Adjoints: fast evaluation of Greeks in Monte Carlo calculations , 2005 .

[12]  H. Kunita Stochastic Flows and Jump-Diffusions , 2019, Probability Theory and Stochastic Modelling.

[13]  Christopher Rackauckas,et al.  ADAPTIVE METHODS FOR STOCHASTIC DIFFERENTIAL EQUATIONS VIA NATURAL EMBEDDINGS AND REJECTION SAMPLING WITH MEMORY. , 2017, Discrete and continuous dynamical systems. Series B.

[14]  E. Gobet Sensitivity analysis using Itô-Malliavin calculus and application to stochastic optimal control , 2002 .

[15]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[16]  H. Kushner,et al.  A Monte Carlo method for sensitivity analysis and parametric optimization of nonlinear stochastic systems , 1991 .

[17]  A. Rössler Runge-Kutta methods for Stratonovich stochastic differential equation systems with commutative noise , 2004 .

[18]  S. Peng A general stochastic maximum principle for optimal control problems , 1990 .

[19]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[20]  Moritz Diehl,et al.  CasADi: a software framework for nonlinear optimization and optimal control , 2018, Mathematical Programming Computation.

[21]  Wayne H. Enright,et al.  Adaptive time-stepping for the strong numerical solution of stochastic differential equations , 2015, Numerical Algorithms.

[22]  Markus Heinonen,et al.  Learning unknown ODE models with Gaussian processes , 2018, ICML.

[23]  S. Shreve Stochastic Calculus for Finance II: Continuous-Time Models , 2010 .

[24]  W. Ewens Mathematical Population Genetics , 1980 .

[25]  Cho-Jui Hsieh,et al.  Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise , 2019, ArXiv.

[26]  Samuel Kaski,et al.  Deep learning with differential Gaussian process flows , 2018, AISTATS.

[27]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[28]  Markus Heinonen,et al.  ODE$^2$VAE: Deep generative second order ODEs with Bayesian neural networks , 2019, NeurIPS.

[29]  J. Quadrat Numerical methods for stochastic control problems in continuous time , 1994 .

[30]  W. Ewens Mathematical Population Genetics : I. Theoretical Introduction , 2004 .

[31]  P. Glasserman,et al.  Some Guidelines and Guarantees for Common Random Numbers , 1992 .

[32]  Stefano Favaro,et al.  Neural Stochastic Differential Equations , 2019, ArXiv.

[33]  Uri Shalit,et al.  Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.

[34]  Joel Andersson,et al.  A General-Purpose Software Framework for Dynamic Optimization (Een algemene softwareomgeving voor dynamische optimalisatie) , 2013 .

[35]  M. Opper Variational Inference for Stochastic Differential Equations , 2019, Annalen der Physik.

[36]  M. Yor,et al.  Continuous martingales and Brownian motion , 1990 .

[37]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[38]  S. Peng,et al.  Backward stochastic differential equations and quasilinear parabolic partial differential equations , 1992 .

[39]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[40]  Matthew Johnson,et al.  Compiling machine learning programs via high-level tracing , 2018 .

[41]  Eldad Haber,et al.  Deep Neural Networks Motivated by Partial Differential Equations , 2018, Journal of Mathematical Imaging and Vision.

[42]  S. Aachen Stochastic Differential Equations An Introduction With Applications , 2016 .

[43]  P. Kloeden,et al.  The pathwise convergence of approximation schemes for stochastic differential equations , 2007 .

[44]  Alan Edelman,et al.  A Differentiable Programming System to Bridge Machine Learning and Scientific Computing , 2019, ArXiv.

[45]  Austin R. Benson,et al.  Neural Jump Stochastic Differential Equations , 2019, NeurIPS.

[46]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[47]  J. Bennett,et al.  ON THE PARTICLES CONTAINED IN THE POLLEN OF PLANTS; , 2004 .

[48]  Pierre L'Ecuyer,et al.  On the Convergence Rates of IPA and FDC Derivative Estimators , 1990, Oper. Res..

[49]  M. V. Tretyakov,et al.  Stochastic Numerics for Mathematical Physics , 2004, Scientific Computation.

[50]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[51]  G. Kitagawa,et al.  Linear Gaussian State Space Modeling , 1996 .

[52]  Simo Särkkä,et al.  Bayesian Filtering and Smoothing , 2013, Institute of Mathematical Statistics textbooks.

[53]  Eldad Haber,et al.  Stable architectures for deep neural networks , 2017, ArXiv.

[54]  P. Kloeden,et al.  Numerical Solution of Stochastic Differential Equations , 1992 .

[55]  E Weinan,et al.  A Proposal on Machine Learning via Dynamical Systems , 2017, Communications in Mathematics and Statistics.

[56]  R. Mehra,et al.  Computational aspects of maximum likelihood estimation and reduction in sensitivity function calculations , 1974 .

[57]  E. Helfand Numerical integration of stochastic differential equations , 1979, The Bell System Technical Journal.

[58]  D. Ocone,et al.  A generalized Itô-Ventzell formula. Application to a class of anticipating stochastic differential equations , 1989 .

[59]  Arno Solin,et al.  Applied Stochastic Differential Equations , 2019 .

[60]  L. Young DIFFERENTIAL EQUATIONS DRIVEN BY ROUGH SIGNALS ( I ) : AN EXTENSION OF AN INEQUALITY OF , 2004 .

[61]  E. Platen An introduction to numerical methods for stochastic differential equations , 1999, Acta Numerica.

[62]  Simo Särkkä,et al.  Parameter estimation in stochastic differential equations with Markov chain Monte Carlo and non-linear Kalman filtering , 2012, Computational Statistics.

[63]  Carl E. Rasmussen,et al.  State-Space Inference and Learning with Gaussian Processes , 2010, AISTATS.

[64]  David Duvenaud,et al.  Latent ODEs for Irregularly-Sampled Time Series , 2019, ArXiv.

[65]  Zhe Gan,et al.  Deep Temporal Sigmoid Belief Networks for Sequence Modeling , 2015, NIPS.

[66]  G. Milstein Numerical Integration of Stochastic Differential Equations , 1994 .

[67]  Andreas Rößler,et al.  Runge-Kutta Methods for the Strong Approximation of Solutions of Stochastic Differential Equations , 2010, SIAM J. Numer. Anal..

[68]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[69]  Koen Claessen,et al.  Splittable pseudorandom number generators using cryptographic hashing , 2013, Haskell '13.

[70]  Han-Lim Choi,et al.  Adaptive path-integral autoencoder: representation learning and planning for dynamical systems , 2018, NeurIPS.

[71]  Bin Dong,et al.  Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations , 2017, ICML.

[72]  M. Yor DIFFUSIONS, MARKOV PROCESSES AND MARTINGALES: Volume 2: Itô Calculus , 1989 .

[73]  David Williams Diffusions, Markov Processes and Martingales: Volume 2, Ito Calculus , 2000 .

[74]  Mark A. Moraes,et al.  Parallel random numbers: As easy as 1, 2, 3 , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[75]  Jessica G. Gaines,et al.  Variable Step Size Control in the Numerical Solution of Stochastic Differential Equations , 1997, SIAM J. Appl. Math..

[76]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[77]  H. Kappen,et al.  Adaptive Importance Sampling for Control and Inference , 2016, Journal of Statistical Physics.

[78]  Evangelos Theodorou,et al.  Nonlinear Stochastic Control and Information Theoretic Dualities: Connections, Interdependencies and Thermodynamic Interpretations , 2015, Entropy.

[79]  Dan Cornford,et al.  Variational Inference for Diffusion Processes , 2007, NIPS.

[80]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[81]  G. Burton Sobolev Spaces , 2013 .

[82]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[83]  Maxim Raginsky,et al.  Neural Stochastic Differential Equations: Deep Latent Gaussian Models in the Diffusion Limit , 2019, ArXiv.

[84]  Peter W. Glynn,et al.  Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.

[85]  V. Arnold,et al.  Ordinary Differential Equations , 1973 .

[86]  Jack P. C. Kleijnen,et al.  Optimization and Sensitivity Analysis of Computer Simulation Models by the Score Function Method , 1996 .

[87]  S. Peng,et al.  Fully Coupled Forward-Backward Stochastic Differential Equations and Applications to Optimal Control , 1999 .

[88]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[89]  M. Wiktorsson Joint characteristic function and simultaneous simulation of iterated Itô integrals for multiple independent Brownian motions , 2001 .

[90]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.