How to train your neural ODE

Training neural ODEs on large datasets has not been tractable due to the necessity of allowing the adaptive numerical ODE solver to refine its step size to very small values. In practice this leads to dynamics equivalent to many hundreds or even thousands of layers. In this paper, we overcome this apparent difficulty by introducing a theoretically-grounded combination of both optimal transport and stability regularizations which encourage neural ODEs to prefer simpler dynamics out of all the dynamics that solve a problem well. Simpler dynamics lead to faster convergence and to fewer discretizations of the solver, considerably decreasing wall-clock time without loss in performance. Our approach allows us to train neural ODE based generative models to the same performance as the unregularized dynamics in just over a day on one GPU, whereas unregularized dynamics can take up to 4-6 days of training time on multiple GPUs. This brings neural ODEs significantly closer to practical relevance in large-scale applications.

[1]  M. L. Chambers The Mathematical Theory of Optimal Processes , 1965 .

[2]  J. Dormand,et al.  A family of embedded Runge-Kutta formulae , 1980 .

[3]  M. Hutchinson A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines , 1989 .

[4]  Harris Drucker,et al.  Improving generalization performance using double backpropagation , 1992, IEEE Trans. Neural Networks.

[5]  Yann Brenier,et al.  A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem , 2000, Numerische Mathematik.

[6]  C. Villani Topics in Optimal Transportation , 2003 .

[7]  G. Burton TOPICS IN OPTIMAL TRANSPORTATION (Graduate Studies in Mathematics 58) By CÉDRIC VILLANI: 370 pp., US$59.00, ISBN 0-8218-3312-X (American Mathematical Society, Providence, RI, 2003) , 2004 .

[8]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[9]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[10]  Sivan Toledo,et al.  Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix , 2011, JACM.

[11]  F. Santambrogio Benamou-Brenier and other continuous numerical methods , 2015 .

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[15]  Eldad Haber,et al.  Stable architectures for deep neural networks , 2017, ArXiv.

[16]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[17]  E Weinan,et al.  Monge-Ampère Flow for Generative Modeling , 2018, ArXiv.

[18]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[19]  Jascha Sohl-Dickstein,et al.  Sensitivity and Generalization in Neural Networks: an Empirical Study , 2018, ICLR.

[20]  Frank Noé,et al.  Equivariant Flows: sampling configurations for multi-body systems with symmetric energies , 2019, ArXiv.

[21]  David Duvenaud,et al.  Invertible Residual Networks , 2018, ICML.

[22]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[23]  Pieter Abbeel,et al.  Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design , 2019, ICML.

[24]  David Duvenaud,et al.  Latent Ordinary Differential Equations for Irregularly-Sampled Time Series , 2019, NeurIPS.

[25]  Eldad Haber,et al.  Deep Neural Networks Motivated by Partial Differential Equations , 2018, Journal of Mathematical Imaging and Vision.

[26]  David Duvenaud,et al.  Residual Flows for Invertible Generative Modeling , 2019, NeurIPS.

[27]  David Duvenaud,et al.  Scalable Gradients for Stochastic Differential Equations , 2020, AISTATS.

[28]  Levon Nurbekyan,et al.  A machine learning framework for solving high-dimensional mean field game and mean field control problems , 2020, Proceedings of the National Academy of Sciences.

[29]  George Em Karniadakis,et al.  Potential Flow Generator With L2 Optimal Transport Regularity for Generative Models , 2019, IEEE Transactions on Neural Networks and Learning Systems.