论文信息 - Dissecting Neural ODEs

Dissecting Neural ODEs

Continuous deep learning architectures have recently re-emerged as variants of Neural Ordinary Differential Equations (Neural ODEs). The infinite-depth approach offered by these models theoretically bridges the gap between deep learning and dynamical systems; however, deciphering their inner working is still an open challenge and most of their applications are currently limited to the inclusion as generic black-box modules. In this work, we "open the box" and offer a system-theoretic perspective, including state augmentation strategies and robustness, with the aim of clarifying the influence of several design choices on the underlying dynamics. We also introduce novel architectures: among them, a Galerkin-inspired depth-varying parameter model and neural ODEs with data-controlled vector fields.

[1] Hajime Asama,et al. TorchDyn: A Neural Differential Equations Library , 2020, ArXiv.

[2] Hajime Asama,et al. Hypersolvers: Toward Fast Continuous-Depth Models , 2020, NeurIPS.

[3] Hajime Asama,et al. Stable Neural Flows , 2020, ArXiv.

[4] Adam M. Oberman,et al. How to train your neural ODE , 2020, ICML.

[5] Ali Ramadhan,et al. Universal Differential Equations for Scientific Machine Learning , 2020, ArXiv.

[6] David Duvenaud,et al. Scalable Gradients for Stochastic Differential Equations , 2020, AISTATS.

[7] Zuowei Shen,et al. Deep Learning via Dynamical Systems: An Approximation Perspective , 2019, Journal of the European Mathematical Society.

[8] Atsushi Yamashita,et al. Graph Neural Ordinary Differential Equations , 2019, ArXiv.

[9] Jiashi Feng,et al. On Robustness of Neural Ordinary Differential Equations , 2019, ICLR.

[10] Atsushi Yamashita,et al. Port–Hamiltonian Approach to Neural Network Training , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[11] J. Z. Kolter,et al. Deep Equilibrium Models , 2019, NeurIPS.

[12] Han Zhang,et al. Approximation Capabilities of Neural ODEs and Invertible Residual Networks , 2019, ICML.

[13] Han Zhang,et al. Approximation Capabilities of Neural Ordinary Differential Equations , 2019, ArXiv.

[14] Kurt Keutzer,et al. ANODEV2: A Coupled Neural ODE Evolution Framework , 2019, ArXiv.

[15] Jason Yosinski,et al. Hamiltonian Neural Networks , 2019, NeurIPS.

[16] Markus Heinonen,et al. ODE$^2$VAE: Deep generative second order ODEs with Bayesian neural networks , 2019, NeurIPS.

[17] Austin R. Benson,et al. Neural Jump Stochastic Differential Equations , 2019, NeurIPS.

[18] M. Raginsky,et al. Neural Stochastic Differential Equations: Deep Latent Gaussian Models in the Diffusion Limit , 2019, ArXiv.

[19] Yee Whye Teh,et al. Augmented Neural ODEs , 2019, NeurIPS.

[20] Eran Treister,et al. IMEXnet: A Forward Stable Deep Neural Network , 2019, ICML.

[21] Ed H. Chi,et al. AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks , 2019, ICLR.

[22] David Duvenaud,et al. FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[23] Yee Whye Teh,et al. Hamiltonian Descent Methods , 2018, ArXiv.

[24] David Duvenaud,et al. Neural Ordinary Differential Equations , 2018, NeurIPS.

[25] Asok Ray,et al. State-Space Representations of Deep Neural Networks , 2018, Neural Computation.

[26] Husheng Li,et al. Analysis on the Nonlinear Dynamics of Deep Neural Networks: Topological Entropy and Chaos , 2018, ArXiv.

[27] Long Chen,et al. Maximum Principle Based Algorithms for Deep Learning , 2017, J. Mach. Learn. Res..

[28] Liwei Wang,et al. The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.

[29] Eldad Haber,et al. Stable architectures for deep neural networks , 2017, ArXiv.

[30] E Weinan,et al. A Proposal on Machine Learning via Dynamical Systems , 2017, Communications in Mathematics and Statistics.

[31] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[32] Andre Wibisono,et al. A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[33] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[35] Yanpeng Li,et al. Improving deep neural networks using softplus units , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[36] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37] Huaguang Zhang,et al. A Comprehensive Review of Stability Analysis of Continuous-Time Recurrent Neural Networks , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[38] Cristian Filici,et al. On a Neural Approximator to ODEs , 2008, IEEE Transactions on Neural Networks.

[39] George Smyrlis,et al. Local convergence of the steepest descent method in Hilbert spaces , 2004 .

[40] C. Villani. Topics in Optimal Transportation , 2003 .

[41] Chin-Teng Lin,et al. Runge-Kutta neural network for identification of dynamical systems in high accuracy , 1998, IEEE Trans. Neural Networks.

[42] A. Wolf,et al. Determining Lyapunov exponents from a time series , 1985 .

[43] Stephen Grossberg,et al. Absolute stability of global pattern formation and parallel memory storage by competitive neural networks , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[44] J. Dormand,et al. High order embedded Runge-Kutta formulae , 1981 .

[45] G. Benettin,et al. Lyapunov Characteristic Exponents for smooth dynamical systems and for hamiltonian systems; a method for computing all of them. Part 1: Theory , 1980 .

[46] G. Benettin,et al. Lyapunov Characteristic Exponents for smooth dynamical systems and for hamiltonian systems; A method for computing all of them. Part 2: Numerical application , 1980 .

[47] M. L. Chambers. The Mathematical Theory of Optimal Processes , 1965 .

[48] E. Lorenz. Deterministic nonperiodic flow , 1963 .

[49] L. S. Pontryagin,et al. Mathematical Theory of Optimal Processes , 1962 .

[50] David Duvenaud,et al. Latent Ordinary Differential Equations for Irregularly-Sampled Time Series , 2019, NeurIPS.

[51] Kevin M. Passino,et al. of nonlinear systems , 2006 .

[52] Jerome H. Friedman,et al. On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[53] R. Westervelt,et al. Stability of analog neural networks with delay. , 1989, Physical review. A, General physics.

[54] Robert Shaw. Strange Attractors, Chaotic Behavior, and Information Flow , 1981 .