Dissecting Neural ODEs

Continuous deep learning architectures have recently re-emerged as variants of Neural Ordinary Differential Equations (Neural ODEs). The infinite-depth approach offered by these models theoretically bridges the gap between deep learning and dynamical systems; however, deciphering their inner working is still an open challenge and most of their applications are currently limited to the inclusion as generic black-box modules. In this work, we "open the box" and offer a system-theoretic perspective, including state augmentation strategies and robustness, with the aim of clarifying the influence of several design choices on the underlying dynamics. We also introduce novel architectures: among them, a Galerkin-inspired depth-varying parameter model and neural ODEs with data-controlled vector fields.

[1]  Hajime Asama,et al.  TorchDyn: A Neural Differential Equations Library , 2020, ArXiv.

[2]  Hajime Asama,et al.  Hypersolvers: Toward Fast Continuous-Depth Models , 2020, NeurIPS.

[3]  Hajime Asama,et al.  Stable Neural Flows , 2020, ArXiv.

[4]  Adam M. Oberman,et al.  How to train your neural ODE , 2020, ICML.

[5]  Ali Ramadhan,et al.  Universal Differential Equations for Scientific Machine Learning , 2020, ArXiv.

[6]  David Duvenaud,et al.  Scalable Gradients for Stochastic Differential Equations , 2020, AISTATS.

[7]  Zuowei Shen,et al.  Deep Learning via Dynamical Systems: An Approximation Perspective , 2019, Journal of the European Mathematical Society.

[8]  Atsushi Yamashita,et al.  Graph Neural Ordinary Differential Equations , 2019, ArXiv.

[9]  Jiashi Feng,et al.  On Robustness of Neural Ordinary Differential Equations , 2019, ICLR.

[10]  Atsushi Yamashita,et al.  Port–Hamiltonian Approach to Neural Network Training , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[11]  J. Z. Kolter,et al.  Deep Equilibrium Models , 2019, NeurIPS.

[12]  Han Zhang,et al.  Approximation Capabilities of Neural ODEs and Invertible Residual Networks , 2019, ICML.

[13]  Han Zhang,et al.  Approximation Capabilities of Neural Ordinary Differential Equations , 2019, ArXiv.

[14]  Kurt Keutzer,et al.  ANODEV2: A Coupled Neural ODE Evolution Framework , 2019, ArXiv.

[15]  Jason Yosinski,et al.  Hamiltonian Neural Networks , 2019, NeurIPS.

[16]  Markus Heinonen,et al.  ODE$^2$VAE: Deep generative second order ODEs with Bayesian neural networks , 2019, NeurIPS.

[17]  Austin R. Benson,et al.  Neural Jump Stochastic Differential Equations , 2019, NeurIPS.

[18]  M. Raginsky,et al.  Neural Stochastic Differential Equations: Deep Latent Gaussian Models in the Diffusion Limit , 2019, ArXiv.

[19]  Yee Whye Teh,et al.  Augmented Neural ODEs , 2019, NeurIPS.

[20]  Eran Treister,et al.  IMEXnet: A Forward Stable Deep Neural Network , 2019, ICML.

[21]  Ed H. Chi,et al.  AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks , 2019, ICLR.

[22]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[23]  Yee Whye Teh,et al.  Hamiltonian Descent Methods , 2018, ArXiv.

[24]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[25]  Asok Ray,et al.  State-Space Representations of Deep Neural Networks , 2018, Neural Computation.

[26]  Husheng Li,et al.  Analysis on the Nonlinear Dynamics of Deep Neural Networks: Topological Entropy and Chaos , 2018, ArXiv.

[27]  Long Chen,et al.  Maximum Principle Based Algorithms for Deep Learning , 2017, J. Mach. Learn. Res..

[28]  Liwei Wang,et al.  The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.

[29]  Eldad Haber,et al.  Stable architectures for deep neural networks , 2017, ArXiv.

[30]  E Weinan,et al.  A Proposal on Machine Learning via Dynamical Systems , 2017, Communications in Mathematics and Statistics.

[31]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[32]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[35]  Yanpeng Li,et al.  Improving deep neural networks using softplus units , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[36]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37]  Huaguang Zhang,et al.  A Comprehensive Review of Stability Analysis of Continuous-Time Recurrent Neural Networks , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Cristian Filici,et al.  On a Neural Approximator to ODEs , 2008, IEEE Transactions on Neural Networks.

[39]  George Smyrlis,et al.  Local convergence of the steepest descent method in Hilbert spaces , 2004 .

[40]  C. Villani Topics in Optimal Transportation , 2003 .

[41]  Chin-Teng Lin,et al.  Runge-Kutta neural network for identification of dynamical systems in high accuracy , 1998, IEEE Trans. Neural Networks.

[42]  A. Wolf,et al.  Determining Lyapunov exponents from a time series , 1985 .

[43]  Stephen Grossberg,et al.  Absolute stability of global pattern formation and parallel memory storage by competitive neural networks , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[44]  J. Dormand,et al.  High order embedded Runge-Kutta formulae , 1981 .

[45]  G. Benettin,et al.  Lyapunov Characteristic Exponents for smooth dynamical systems and for hamiltonian systems; a method for computing all of them. Part 1: Theory , 1980 .

[46]  G. Benettin,et al.  Lyapunov Characteristic Exponents for smooth dynamical systems and for hamiltonian systems; A method for computing all of them. Part 2: Numerical application , 1980 .

[47]  M. L. Chambers The Mathematical Theory of Optimal Processes , 1965 .

[48]  E. Lorenz Deterministic nonperiodic flow , 1963 .

[49]  L. S. Pontryagin,et al.  Mathematical Theory of Optimal Processes , 1962 .

[50]  David Duvenaud,et al.  Latent Ordinary Differential Equations for Irregularly-Sampled Time Series , 2019, NeurIPS.

[51]  Kevin M. Passino,et al.  of nonlinear systems , 2006 .

[52]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[53]  R. Westervelt,et al.  Stability of analog neural networks with delay. , 1989, Physical review. A, General physics.

[54]  Robert Shaw Strange Attractors, Chaotic Behavior, and Information Flow , 1981 .