Efficient and Accurate Gradients for Neural SDEs

Neural SDEs combine many of the best qualities of both RNNs and SDEs, and as such are a natural choice for modelling many types of temporal dynamics. They offer memory efficiency, high-capacity function approximation, and strong priors on model space. Neural SDEs may be trained as VAEs or as GANs; in either case it is necessary to backpropagate through the SDE solve. In particular this may be done by constructing a backwards-in-time SDE whose solution is the desired parameter gradients. However, this has previously suffered from severe speed and accuracy issues, due to high computational complexity, numerical errors in the SDE solve, and the cost of reconstructing Brownian motion. Here, we make several technical innovations to overcome these issues. First, we introduce the reversible Heun method: a new SDE solver that is algebraically reversible – which reduces numerical gradient errors to almost zero, improving several test metrics by substantial margins over state-of-the-art. Moreover it requires half as many function evaluations as comparable solvers, giving up to a 1.98× speedup. Next, we introduce the Brownian interval. This is a new and computationally efficient way of exactly sampling and reconstructing Brownian motion; this is in contrast to previous reconstruction techniques that are both approximate and relatively slow. This gives up to a 10.6× speed improvement over previous techniques. After that, when specifically training Neural SDEs as GANs (Kidger et al. 2021), we demonstrate how SDE-GANs may be trained through careful weight clipping and choice of activation function. This reduces computational cost (giving up to a 1.87× speedup), and removes the truncation errors of the double adjoint required for gradient penalty, substantially improving several test metrics. Altogether these techniques offer substantial improvements over the state-of-the-art, with respect to both training speed and with respect to classification, prediction, and MMD test metrics. We have contributed implementations of all of our techniques to the torchsde library to help facilitate their adoption.

[1]  Greg Mori,et al.  Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows , 2020, NeurIPS.

[2]  Terry Lyons,et al.  Pathwise approximation of SDEs by coupling piecewise abelian rough paths , 2015, 1505.01298.

[3]  S. Shreve Stochastic Calculus for Finance II: Continuous-Time Models , 2010 .

[4]  A. Davie KMT Theory Applied to Approximations of SDE , 2014 .

[5]  Patrick Kidger,et al.  Signatory: differentiable computations of the signature and logsignature transforms, on both CPU and GPU , 2020, ICLR.

[6]  Terry Lyons,et al.  Neural SDEs Made Easy: SDEs are Infinite-Dimensional GANS , 2020 .

[7]  Franz J. Király,et al.  Kernels for sequentially ordered data , 2016, J. Mach. Learn. Res..

[8]  Ricky T. Q. Chen,et al.  Scalable Gradients and Variational Inference for Stochastic Differential Equations , 2019, AABI.

[9]  Ed H. Chi,et al.  AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks , 2019, ICLR.

[10]  Andrew S. Dickinson,et al.  Optimal Approximation of the Second Iterated Integral of Brownian Motion , 2007 .

[11]  Edward De Brouwer,et al.  GRU-ODE-Bayes: Continuous modeling of sporadically-observed time series , 2019, NeurIPS.

[12]  Patrick Kidger,et al.  Neural SDEs as Infinite-Dimensional GANs , 2021, ICML.

[13]  Jing He,et al.  Cautionary tales on air-quality improvement in Beijing , 2017, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[14]  Edwin V. Bonilla,et al.  SigGPDE: Scaling Sparse Gaussian Processes on Sequential Data , 2021, ICML.

[15]  Mihaela van der Schaar,et al.  Time-series Generative Adversarial Networks , 2019, NeurIPS.

[16]  Kenji Doya,et al.  Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning , 2017, Neural Networks.

[17]  Koen Claessen,et al.  Splittable pseudorandom number generators using cryptographic hashing , 2013, Haskell '13.

[18]  Andrew Gordon Wilson,et al.  Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.

[19]  Allan Pinkus,et al.  Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.

[20]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Pietro Lio,et al.  Neural ODE Processes , 2021, ICLR.

[22]  Stefan Winkler,et al.  The Unusual Effectiveness of Averaging in GAN Training , 2018, ICLR.

[23]  T. Faniran Numerical Solution of Stochastic Differential Equations , 2015 .

[24]  Maxim Raginsky,et al.  Neural Stochastic Differential Equations: Deep Latent Gaussian Models in the Diffusion Limit , 2019, ArXiv.

[25]  Sekhar Tatikonda,et al.  MALI: A memory efficient and reverse accurate integrator for Neural ODEs , 2021, ICLR.

[26]  David Duvenaud,et al.  Latent Ordinary Differential Equations for Irregularly-Sampled Time Series , 2019, NeurIPS.

[27]  Terry Lyons,et al.  The Signature Kernel Is the Solution of a Goursat PDE , 2020, SIAM J. Math. Data Sci..

[28]  M. Yor,et al.  Continuous martingales and Brownian motion , 1990 .

[29]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[30]  C. Holt Author's retrospective on ‘Forecasting seasonals and trends by exponentially weighted moving averages’ , 2004 .

[31]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[32]  Patrick Kidger,et al.  Neural Rough Differential Equations for Long Time Series , 2021, ICML.

[33]  David Duvenaud,et al.  Residual Flows for Invertible Generative Modeling , 2019, NeurIPS.

[34]  M. Wiktorsson Joint characteristic function and simultaneous simulation of iterated Itô integrals for multiple independent Brownian motions , 2001 .

[35]  Jessica G. Gaines,et al.  Random Generation of Stochastic Area Integrals , 1994, SIAM J. Appl. Math..

[36]  Brownian bridge expansions for Lévy area approximations and particular values of the Riemann zeta function , 2021, ArXiv.

[37]  Cheng-Yan Kao,et al.  A stochastic differential equation model for quantifying transcriptional regulatory network in Saccharomyces cerevisiae , 2005, Bioinform..

[38]  Andreas Griewank,et al.  Achieving logarithmic growth of temporal and spatial complexity in reverse automatic differentiation , 1992 .

[39]  Richard S. Hamilton,et al.  The inverse function theorem of Nash and Moser , 1982 .

[40]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[41]  T. Huillet On Wright–Fisher diffusion and its relatives , 2007 .

[42]  Mathieu Blondel,et al.  Momentum Residual Neural Networks , 2021, ICML.

[43]  Andreas Rößler,et al.  Runge-Kutta Methods for the Strong Approximation of Solutions of Stochastic Differential Equations , 2010, SIAM J. Numer. Anal..

[44]  Austin R. Benson,et al.  Neural Jump Stochastic Differential Equations , 2019, NeurIPS.

[45]  Jimeng Sun,et al.  SDE-Net: Equipping Deep Neural Networks with Uncertainty Estimates , 2020, ICML.

[46]  Xiling Zhang,et al.  On numerical approximations for stochastic differential equations , 2017 .

[47]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[48]  Mark A. Moraes,et al.  Parallel random numbers: As easy as 1, 2, 3 , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[49]  Calypso Herrera,et al.  Neural Jump Ordinary Differential Equations: Consistent Continuous-Time Prediction and Filtering , 2020, ICLR.

[50]  Jessica G. Gaines,et al.  Variable Step Size Control in the Numerical Solution of Stochastic Differential Equations , 1997, SIAM J. Appl. Math..

[51]  Patrick Kidger,et al.  Universal Approximation with Deep Narrow Networks , 2019, COLT 2019.

[52]  W. Coffey,et al.  The Langevin equation : with applications to stochastic problems in physics, chemistry, and electrical engineering , 2012 .

[53]  James Foster,et al.  An Optimal Polynomial Approximation of Brownian Motion , 2019, SIAM J. Numer. Anal..

[54]  F. Black,et al.  The Pricing of Options and Corporate Liabilities , 1973, Journal of Political Economy.

[55]  Cho-Jui Hsieh,et al.  Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise , 2019, ArXiv.

[56]  Peter R. Winters,et al.  Forecasting Sales by Exponentially Weighted Moving Averages , 1960 .

[57]  Quoc V. Le,et al.  Swish: a Self-Gated Activation Function , 2017, 1710.05941.

[58]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[59]  Terry Lyons,et al.  Neural Controlled Differential Equations for Irregular Time Series , 2020, NeurIPS.

[60]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[61]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[62]  M. Arató A famous nonlinear stochastic equation (Lotka-Volterra model with diffusion) , 2003 .

[63]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[64]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[65]  Maxim Raginsky,et al.  Theoretical guarantees for sampling and inference in generative models with latent diffusions , 2019, COLT.

[66]  Markus Heinonen,et al.  ODE2VAE: Deep generative second order ODEs with Bayesian neural networks , 2019, NeurIPS.

[67]  Kevin Gimpel,et al.  Gaussian Error Linear Units (GELUs) , 2016 .

[68]  Csaba Tóth,et al.  Bayesian Learning from Sequential Data using Gaussian Processes with Signature Covariances , 2019, ICML.

[69]  Andreas Rossler,et al.  On the approximation and simulation of iterated stochastic integrals and the corresponding Lévy areas in terms of a multidimensional Brownian motion , 2021, Stochastic Analysis and Applications.

[70]  Patrick Kidger,et al.  Deep Signatures , 2019, NeurIPS 2019.

[71]  Terry Lyons,et al.  A Generalised Signature Method for Multivariate Time Series Feature Extraction , 2021 .

[72]  Lawrence F. Shampine Stability of the leapfrog/midpoint method , 2009, Appl. Math. Comput..

[73]  David Siska,et al.  Robust Pricing and Hedging via Neural SDEs , 2020, SSRN Electronic Journal.

[74]  E. Hannan,et al.  Recursive estimation of mixed autoregressive-moving average order , 1982 .

[75]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[76]  David Duvenaud,et al.  Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations , 2021, ArXiv.

[77]  M. V. Tretyakov,et al.  On the long-time integration of stochastic gradient systems , 2014, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[78]  David Duvenaud,et al.  Invertible Residual Networks , 2018, ICML.

[79]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.