论文信息 - Efficient and Accurate Gradients for Neural SDEs

Efficient and Accurate Gradients for Neural SDEs

Neural SDEs combine many of the best qualities of both RNNs and SDEs, and as such are a natural choice for modelling many types of temporal dynamics. They offer memory efficiency, high-capacity function approximation, and strong priors on model space. Neural SDEs may be trained as VAEs or as GANs; in either case it is necessary to backpropagate through the SDE solve. In particular this may be done by constructing a backwards-in-time SDE whose solution is the desired parameter gradients. However, this has previously suffered from severe speed and accuracy issues, due to high computational complexity, numerical errors in the SDE solve, and the cost of reconstructing Brownian motion. Here, we make several technical innovations to overcome these issues. First, we introduce the reversible Heun method: a new SDE solver that is algebraically reversible – which reduces numerical gradient errors to almost zero, improving several test metrics by substantial margins over state-of-the-art. Moreover it requires half as many function evaluations as comparable solvers, giving up to a 1.98× speedup. Next, we introduce the Brownian interval. This is a new and computationally efficient way of exactly sampling and reconstructing Brownian motion; this is in contrast to previous reconstruction techniques that are both approximate and relatively slow. This gives up to a 10.6× speed improvement over previous techniques. After that, when specifically training Neural SDEs as GANs (Kidger et al. 2021), we demonstrate how SDE-GANs may be trained through careful weight clipping and choice of activation function. This reduces computational cost (giving up to a 1.87× speedup), and removes the truncation errors of the double adjoint required for gradient penalty, substantially improving several test metrics. Altogether these techniques offer substantial improvements over the state-of-the-art, with respect to both training speed and with respect to classification, prediction, and MMD test metrics. We have contributed implementations of all of our techniques to the torchsde library to help facilitate their adoption.

[1] Greg Mori,et al. Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows , 2020, NeurIPS.

[2] Terry Lyons,et al. Pathwise approximation of SDEs by coupling piecewise abelian rough paths , 2015, 1505.01298.

[3] S. Shreve. Stochastic Calculus for Finance II: Continuous-Time Models , 2010 .

[4] A. Davie. KMT Theory Applied to Approximations of SDE , 2014 .

[5] Patrick Kidger,et al. Signatory: differentiable computations of the signature and logsignature transforms, on both CPU and GPU , 2020, ICLR.

[6] Terry Lyons,et al. Neural SDEs Made Easy: SDEs are Inﬁnite-Dimensional GANS , 2020 .

[7] Franz J. Király,et al. Kernels for sequentially ordered data , 2016, J. Mach. Learn. Res..

[8] Ricky T. Q. Chen,et al. Scalable Gradients and Variational Inference for Stochastic Differential Equations , 2019, AABI.

[9] Ed H. Chi,et al. AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks , 2019, ICLR.

[10] Andrew S. Dickinson,et al. Optimal Approximation of the Second Iterated Integral of Brownian Motion , 2007 .

[11] Edward De Brouwer,et al. GRU-ODE-Bayes: Continuous modeling of sporadically-observed time series , 2019, NeurIPS.

[12] Patrick Kidger,et al. Neural SDEs as Infinite-Dimensional GANs , 2021, ICML.

[13] Jing He,et al. Cautionary tales on air-quality improvement in Beijing , 2017, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[14] Edwin V. Bonilla,et al. SigGPDE: Scaling Sparse Gaussian Processes on Sequential Data , 2021, ICML.

[15] Mihaela van der Schaar,et al. Time-series Generative Adversarial Networks , 2019, NeurIPS.

[16] Kenji Doya,et al. Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning , 2017, Neural Networks.

[17] Koen Claessen,et al. Splittable pseudorandom number generators using cryptographic hashing , 2013, Haskell '13.

[18] Andrew Gordon Wilson,et al. Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.

[19] Allan Pinkus,et al. Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.

[20] Jaakko Lehtinen,et al. Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Pietro Lio,et al. Neural ODE Processes , 2021, ICLR.

[22] Stefan Winkler,et al. The Unusual Effectiveness of Averaging in GAN Training , 2018, ICLR.

[23] T. Faniran. Numerical Solution of Stochastic Differential Equations , 2015 .

[24] Maxim Raginsky,et al. Neural Stochastic Differential Equations: Deep Latent Gaussian Models in the Diffusion Limit , 2019, ArXiv.

[25] Sekhar Tatikonda,et al. MALI: A memory efficient and reverse accurate integrator for Neural ODEs , 2021, ICLR.

[26] David Duvenaud,et al. Latent Ordinary Differential Equations for Irregularly-Sampled Time Series , 2019, NeurIPS.

[27] Terry Lyons,et al. The Signature Kernel Is the Solution of a Goursat PDE , 2020, SIAM J. Math. Data Sci..

[28] M. Yor,et al. Continuous martingales and Brownian motion , 1990 .

[29] Aaron C. Courville,et al. Improved Training of Wasserstein GANs , 2017, NIPS.

[30] C. Holt. Author's retrospective on ‘Forecasting seasonals and trends by exponentially weighted moving averages’ , 2004 .

[31] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[32] Patrick Kidger,et al. Neural Rough Differential Equations for Long Time Series , 2021, ICML.

[33] David Duvenaud,et al. Residual Flows for Invertible Generative Modeling , 2019, NeurIPS.

[34] M. Wiktorsson. Joint characteristic function and simultaneous simulation of iterated Itô integrals for multiple independent Brownian motions , 2001 .

[35] Jessica G. Gaines,et al. Random Generation of Stochastic Area Integrals , 1994, SIAM J. Appl. Math..

[36] Brownian bridge expansions for Lévy area approximations and particular values of the Riemann zeta function , 2021, ArXiv.

[37] Cheng-Yan Kao,et al. A stochastic differential equation model for quantifying transcriptional regulatory network in Saccharomyces cerevisiae , 2005, Bioinform..

[38] Andreas Griewank,et al. Achieving logarithmic growth of temporal and spatial complexity in reverse automatic differentiation , 1992 .

[39] Richard S. Hamilton,et al. The inverse function theorem of Nash and Moser , 1982 .

[40] Richard S. Zemel,et al. Generative Moment Matching Networks , 2015, ICML.

[41] T. Huillet. On Wright–Fisher diffusion and its relatives , 2007 .

[42] Mathieu Blondel,et al. Momentum Residual Neural Networks , 2021, ICML.

[43] Andreas Rößler,et al. Runge-Kutta Methods for the Strong Approximation of Solutions of Stochastic Differential Equations , 2010, SIAM J. Numer. Anal..

[44] Austin R. Benson,et al. Neural Jump Stochastic Differential Equations , 2019, NeurIPS.

[45] Jimeng Sun,et al. SDE-Net: Equipping Deep Neural Networks with Uncertainty Estimates , 2020, ICML.

[46] Xiling Zhang,et al. On numerical approximations for stochastic differential equations , 2017 .

[47] Bernhard Schölkopf,et al. A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[48] Mark A. Moraes,et al. Parallel random numbers: As easy as 1, 2, 3 , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[49] Calypso Herrera,et al. Neural Jump Ordinary Differential Equations: Consistent Continuous-Time Prediction and Filtering , 2020, ICLR.

[50] Jessica G. Gaines,et al. Variable Step Size Control in the Numerical Solution of Stochastic Differential Equations , 1997, SIAM J. Appl. Math..

[51] Patrick Kidger,et al. Universal Approximation with Deep Narrow Networks , 2019, COLT 2019.

[52] W. Coffey,et al. The Langevin equation : with applications to stochastic problems in physics, chemistry, and electrical engineering , 2012 .

[53] James Foster,et al. An Optimal Polynomial Approximation of Brownian Motion , 2019, SIAM J. Numer. Anal..

[54] F. Black,et al. The Pricing of Options and Corporate Liabilities , 1973, Journal of Political Economy.

[55] Cho-Jui Hsieh,et al. Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise , 2019, ArXiv.

[56] Peter R. Winters,et al. Forecasting Sales by Exponentially Weighted Moving Averages , 1960 .

[57] Quoc V. Le,et al. Swish: a Self-Gated Activation Function , 2017, 1710.05941.

[58] David Duvenaud,et al. Neural Ordinary Differential Equations , 2018, NeurIPS.

[59] Terry Lyons,et al. Neural Controlled Differential Equations for Irregular Time Series , 2020, NeurIPS.

[60] Abhishek Kumar,et al. Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.