Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs

We propose a simple interpolation-based method for the efficient approximation of gradients in neural ODE models. We compare it with the reverse dynamic method (known in the literature as "adjoint method") to train neural ODEs on classification, density estimation, and inference approximation tasks. We also propose a theoretical justification of our approach using logarithmic norm formalism. As a result, our method allows faster model training than the reverse dynamic method that was confirmed and validated by extensive numerical experiments for several standard benchmarks.

[1]  Kurt Keutzer,et al.  ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs , 2019, IJCAI.

[2]  Bin Dong,et al.  Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations , 2017, ICML.

[3]  Andreas Fichtner,et al.  The adjoint method in seismology – I. Theory , 2006 .

[4]  Niles A. Pierce,et al.  An Introduction to the Adjoint Approach to Design , 2000 .

[5]  Lloyd N. Trefethen,et al.  Barycentric Lagrange Interpolation , 2004, SIAM Rev..

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  L. Shampine Interpolation for Runge–Kutta Methods , 1985 .

[8]  Sergey Pavlov,et al.  “Zhores” — Petaflops supercomputer for data-driven modeling, machine learning and artificial intelligence installed in Skolkovo Institute of Science and Technology , 2019, Open Engineering.

[9]  Richard G. Baraniuk,et al.  InfoCNF: An Efficient Conditional Continuous Normalizing Flow with Adaptive Solvers , 2019, ArXiv.

[10]  R.M.M. Mattheij,et al.  Stability and asymptotic estimates in nonautonomous linear differential systems , 1985 .

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Yee Whye Teh,et al.  Augmented Neural ODEs , 2019, NeurIPS.

[13]  Kurt Keutzer,et al.  ANODEV2: A Coupled Neural ODE Evolution Framework , 2019, ArXiv.

[14]  J. Dormand,et al.  A family of embedded Runge-Kutta formulae , 1980 .

[15]  Eldad Haber,et al.  Deep Neural Networks Motivated by Partial Differential Equations , 2018, Journal of Mathematical Imaging and Vision.

[16]  B. Roe,et al.  Boosted decision trees as an alternative to artificial neural networks for particle identification , 2004, physics/0408124.

[17]  Guriĭ Ivanovich Marchuk,et al.  Adjoint Equations and Analysis of Complex Systems , 1995 .

[18]  Daniel J. Arrigo,et al.  An Introduction to Partial Differential Equations , 2017, An Introduction to Partial Differential Equations.

[19]  L. Shampine,et al.  Some practical Runge-Kutta formulas , 1986 .

[20]  Frederick Tung,et al.  Multi-level Residual Networks from Dynamical Systems View , 2017, ICLR.

[21]  Jonathan Masci,et al.  SNODE: Spectral Discretization of Neural ODEs for System Identification , 2020, ICLR.

[22]  G. Söderlind,et al.  The logarithmic norm. History and modern theory , 2006 .

[23]  Alexandr Katrutsa,et al.  Towards Understanding Normalization in Neural ODEs , 2020, ICLR 2020.

[24]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[25]  David Duvenaud,et al.  Latent Ordinary Differential Equations for Irregularly-Sampled Time Series , 2019, NeurIPS.

[26]  Bengt Fornberg,et al.  A practical guide to pseudospectral methods: Introduction , 1996 .

[27]  R. Plessix A review of the adjoint-state method for computing the gradient of a functional with geophysical applications , 2006 .

[28]  E. Hairer,et al.  Solving Ordinary Differential Equations II , 2010 .

[29]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[30]  Ali Ramadhan,et al.  Universal Differential Equations for Scientific Machine Learning , 2020, ArXiv.

[31]  N. Higham The numerical stability of barycentric Lagrange interpolation , 2004 .

[32]  Diederik P. Kingma,et al.  Stochastic Gradient VB and the Variational Auto-Encoder , 2013 .

[33]  M. C. Hall,et al.  Application of adjoint sensitivity theory to an atmospheric general circulation model , 1986 .

[34]  R. Serban,et al.  CVODES: The Sensitivity-Enabled ODE Solver in SUNDIALS , 2005 .

[35]  Manuel Calvo,et al.  Stiffness 1952–2012: Sixty years in search of a definition , 2015 .