论文信息 - Scalable Gradients for Stochastic Differential Equations

Scalable Gradients for Stochastic Differential Equations

The adjoint sensitivity method scalably computes gradients of solutions to ordinary differential equations. We generalize this method to stochastic differential equations, allowing time-efficient and constant-memory computation of gradients with high-order adaptive solvers. Specifically, we derive a stochastic differential equation whose solution is the gradient, a memory-efficient algorithm for caching noise, and conditions under which numerical solutions converge. In addition, we combine our method with gradient-based stochastic variational inference for latent stochastic differential equations. We use our method to fit stochastic dynamics defined by neural networks, achieving competitive performance on a 50-dimensional motion capture dataset.

David Duvenaud | Ricky T. Q. Chen | Ting-Kam Leonard Wong | Xuechen Li

[1] Maxim Raginsky,et al. Theoretical guarantees for sampling and inference in generative models with latent diffusions , 2019, COLT.

[2] Markus Heinonen,et al. ODE2VAE: Deep generative second order ODEs with Bayesian neural networks , 2019, NeurIPS.

[3] David J. Fleet,et al. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[4] K. Burrage,et al. Adaptive stepsize based on control theory for stochastic differential equations , 2004 .

[5] Eldad Haber,et al. Reversible Architectures for Arbitrarily Deep Residual Neural Networks , 2017, AAAI.

[6] Werner Römisch,et al. Numerical Solution of Stochastic Differential Equations (Peter E. Kloeden and Eckhard Platen) , 1995, SIAM Rev..

[7] Barak A. Pearlmutter. Gradient calculations for dynamic recurrent neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[8] A. Stephen McGough,et al. Black-Box Variational Inference for Stochastic Differential Equations , 2018, ICML.

[9] Frederick Tung,et al. Multi-level Residual Networks from Dynamical Systems View , 2017, ICLR.

[10] Long Chen,et al. Maximum Principle Based Algorithms for Deep Learning , 2017, J. Mach. Learn. Res..

[11] P. Glasserman,et al. Smoking Adjoints: fast evaluation of Greeks in Monte Carlo calculations , 2005 .

[12] H. Kunita. Stochastic Flows and Jump-Diffusions , 2019, Probability Theory and Stochastic Modelling.

[13] Christopher Rackauckas,et al. ADAPTIVE METHODS FOR STOCHASTIC DIFFERENTIAL EQUATIONS VIA NATURAL EMBEDDINGS AND REJECTION SAMPLING WITH MEMORY. , 2017, Discrete and continuous dynamical systems. Series B.

[14] E. Gobet. Sensitivity analysis using Itô-Malliavin calculus and application to stochastic optimal control , 2002 .

[15] David Duvenaud,et al. Neural Ordinary Differential Equations , 2018, NeurIPS.

[16] H. Kushner,et al. A Monte Carlo method for sensitivity analysis and parametric optimization of nonlinear stochastic systems , 1991 .

[17] A. Rössler. Runge-Kutta methods for Stratonovich stochastic differential equation systems with commutative noise , 2004 .

[18] S. Peng. A general stochastic maximum principle for optimal control problems , 1990 .

[19] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[20] Moritz Diehl,et al. CasADi: a software framework for nonlinear optimization and optimal control , 2018, Mathematical Programming Computation.

[21] Wayne H. Enright,et al. Adaptive time-stepping for the strong numerical solution of stochastic differential equations , 2015, Numerical Algorithms.

[22] Markus Heinonen,et al. Learning unknown ODE models with Gaussian processes , 2018, ICML.

[23] S. Shreve. Stochastic Calculus for Finance II: Continuous-Time Models , 2010 .

[24] W. Ewens. Mathematical Population Genetics , 1980 .

[25] Cho-Jui Hsieh,et al. Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise , 2019, ArXiv.

[26] Samuel Kaski,et al. Deep learning with differential Gaussian process flows , 2018, AISTATS.

[27] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[28] Markus Heinonen,et al. ODE$^2$VAE: Deep generative second order ODEs with Bayesian neural networks , 2019, NeurIPS.

[29] J. Quadrat. Numerical methods for stochastic control problems in continuous time , 1994 .

[30] W. Ewens. Mathematical Population Genetics : I. Theoretical Introduction , 2004 .

[31] P. Glasserman,et al. Some Guidelines and Guarantees for Common Random Numbers , 1992 .

[32] Stefano Favaro,et al. Neural Stochastic Differential Equations , 2019, ArXiv.

[33] Uri Shalit,et al. Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.

[34] Joel Andersson,et al. A General-Purpose Software Framework for Dynamic Optimization (Een algemene softwareomgeving voor dynamische optimalisatie) , 2013 .

[35] M. Opper. Variational Inference for Stochastic Differential Equations , 2019, Annalen der Physik.

[36] M. Yor,et al. Continuous martingales and Brownian motion , 1990 .

[37] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[38] S. Peng,et al. Backward stochastic differential equations and quasilinear parabolic partial differential equations , 1992 .

[39] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[40] Matthew Johnson,et al. Compiling machine learning programs via high-level tracing , 2018 .

[41] Eldad Haber,et al. Deep Neural Networks Motivated by Partial Differential Equations , 2018, Journal of Mathematical Imaging and Vision.

[42] S. Aachen. Stochastic Differential Equations An Introduction With Applications , 2016 .

[43] P. Kloeden,et al. The pathwise convergence of approximation schemes for stochastic differential equations , 2007 .

[44] Alan Edelman,et al. A Differentiable Programming System to Bridge Machine Learning and Scientific Computing , 2019, ArXiv.

[45] Austin R. Benson,et al. Neural Jump Stochastic Differential Equations , 2019, NeurIPS.

[46] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[47] J. Bennett,et al. ON THE PARTICLES CONTAINED IN THE POLLEN OF PLANTS; , 2004 .

[48] Pierre L'Ecuyer,et al. On the Convergence Rates of IPA and FDC Derivative Estimators , 1990, Oper. Res..

[49] M. V. Tretyakov,et al. Stochastic Numerics for Mathematical Physics , 2004, Scientific Computation.

[50] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[51] G. Kitagawa,et al. Linear Gaussian State Space Modeling , 1996 .

[52] Simo Särkkä,et al. Bayesian Filtering and Smoothing , 2013, Institute of Mathematical Statistics textbooks.

[53] Eldad Haber,et al. Stable architectures for deep neural networks , 2017, ArXiv.

[54] P. Kloeden,et al. Numerical Solution of Stochastic Differential Equations , 1992 .

[55] E Weinan,et al. A Proposal on Machine Learning via Dynamical Systems , 2017, Communications in Mathematics and Statistics.

[56] R. Mehra,et al. Computational aspects of maximum likelihood estimation and reduction in sensitivity function calculations , 1974 .

[57] E. Helfand. Numerical integration of stochastic differential equations , 1979, The Bell System Technical Journal.

[58] D. Ocone,et al. A generalized Itô-Ventzell formula. Application to a class of anticipating stochastic differential equations , 1989 .

[59] Arno Solin,et al. Applied Stochastic Differential Equations , 2019 .

[60] L. Young. DIFFERENTIAL EQUATIONS DRIVEN BY ROUGH SIGNALS ( I ) : AN EXTENSION OF AN INEQUALITY OF , 2004 .

[61] E. Platen. An introduction to numerical methods for stochastic differential equations , 1999, Acta Numerica.

[62] Simo Särkkä,et al. Parameter estimation in stochastic differential equations with Markov chain Monte Carlo and non-linear Kalman filtering , 2012, Computational Statistics.

[63] Carl E. Rasmussen,et al. State-Space Inference and Learning with Gaussian Processes , 2010, AISTATS.

[64] David Duvenaud,et al. Latent ODEs for Irregularly-Sampled Time Series , 2019, ArXiv.

[65] Zhe Gan,et al. Deep Temporal Sigmoid Belief Networks for Sequence Modeling , 2015, NIPS.

[66] G. Milstein. Numerical Integration of Stochastic Differential Equations , 1994 .

[67] Andreas Rößler,et al. Runge-Kutta Methods for the Strong Approximation of Solutions of Stochastic Differential Equations , 2010, SIAM J. Numer. Anal..

[68] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[69] Koen Claessen,et al. Splittable pseudorandom number generators using cryptographic hashing , 2013, Haskell '13.

[70] Han-Lim Choi,et al. Adaptive path-integral autoencoder: representation learning and planning for dynamical systems , 2018, NeurIPS.

[71] Bin Dong,et al. Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations , 2017, ICML.

[72] M. Yor. DIFFUSIONS, MARKOV PROCESSES AND MARTINGALES: Volume 2: Itô Calculus , 1989 .

[73] David Williams. Diffusions, Markov Processes and Martingales: Volume 2, Ito Calculus , 2000 .

[74] Mark A. Moraes,et al. Parallel random numbers: As easy as 1, 2, 3 , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[75] Jessica G. Gaines,et al. Variable Step Size Control in the Numerical Solution of Stochastic Differential Equations , 1997, SIAM J. Appl. Math..

[76] Tianqi Chen,et al. A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[77] H. Kappen,et al. Adaptive Importance Sampling for Control and Inference , 2016, Journal of Statistical Physics.

[78] Evangelos Theodorou,et al. Nonlinear Stochastic Control and Information Theoretic Dualities: Connections, Interdependencies and Thermodynamic Interpretations , 2015, Entropy.

[79] Dan Cornford,et al. Variational Inference for Diffusion Processes , 2007, NIPS.

[80] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[81] G. Burton. Sobolev Spaces , 2013 .

[82] David Duvenaud,et al. FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[83] Maxim Raginsky,et al. Neural Stochastic Differential Equations: Deep Latent Gaussian Models in the Diffusion Limit , 2019, ArXiv.

[84] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.

[85] V. Arnold,et al. Ordinary Differential Equations , 1973 .

[86] Jack P. C. Kleijnen,et al. Optimization and Sensitivity Analysis of Computer Simulation Models by the Score Function Method , 1996 .

[87] S. Peng,et al. Fully Coupled Forward-Backward Stochastic Differential Equations and Applications to Optimal Control , 1999 .

[88] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[89] M. Wiktorsson. Joint characteristic function and simultaneous simulation of iterated Itô integrals for multiple independent Brownian motions , 2001 .

[90] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.