Embedding Capabilities of Neural ODEs

A class of neural networks that gained particular interest in the last years are neural ordinary differential equations (neural ODEs). We study input-output relations of neural ODEs using dynamical systems theory and prove several results about the exact embedding of maps in different neural ODE architectures in low and high dimension. The embedding capability of a neural ODE architecture can be increased by adding, for example, a linear layer, or augmenting the phase space. Yet, there is currently no systematic theory available and our work contributes towards this goal by developing various embedding results as well as identifying situations, where no embedding is possible. The mathematical techniques used include as main components iterative functional equations, Morse functions and suspension flows, as well as several further ideas from analysis. Although practically, mainly universal approximation theorems are used, our geometric dynamical systems viewpoint on universal embedding provides a fundamental understanding, why certain neural ODE architectures perform better than others.

[1]  Han Zhang,et al.  Approximation Capabilities of Neural ODEs and Invertible Residual Networks , 2019, ICML.

[2]  Yee Whye Teh,et al.  Augmented Neural ODEs , 2019, NeurIPS.

[3]  C. Aggarwal Neural Networks and Deep Learning: A Textbook , 2018 .

[4]  Stefanie Jegelka,et al.  ResNet with one-neuron hidden layers is a Universal Approximator , 2018, NeurIPS.

[5]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[6]  Tomaso A. Poggio,et al.  Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex , 2016, ArXiv.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Eric A Sobie,et al.  An Introduction to Dynamical Systems , 2011, Science Signaling.

[9]  Barbara Hammer,et al.  Learning with recurrent neural networks , 2000 .

[10]  Allan Pinkus,et al.  Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.

[11]  M. Kuczma,et al.  Iterative Functional Equations , 1990 .

[12]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[13]  D. Gronau,et al.  Some Differential Equations Related to Iteration Theory , 1988, Canadian Journal of Mathematics - Journal Canadien de Mathematiques.

[14]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[15]  R. Palais The Morse lemma for Banach spaces , 1969 .

[16]  S. A. Andrea,et al.  On homeomorphisms of the plane, and their embedding in flows , 1965 .

[17]  Marston Morse,et al.  Topologically non-degenerate functions on a compactn-manifoldM , 1959 .

[18]  M. K. Fort,et al.  THE EMBEDDING OF HOMEOMORPHISMS IN FLOWS , 1955 .

[19]  H. Whitney The Self-Intersections of a Smooth n-Manifold in 2n-Space , 1944 .

[20]  A. Sard,et al.  The measure of the critical values of differentiable maps , 1942 .

[21]  M. Morse The Calculus of Variations in the Large , 1934 .

[22]  J. Junker Introduction To Approximation Theory , 2016 .

[23]  Susanne Ebersbach,et al.  Basic Theory Of Ordinary Differential Equations , 2016 .

[24]  Mandy Eberhart,et al.  Ordinary Differential Equations With Applications , 2016 .

[25]  A. Blumberg BASIC TOPOLOGY , 2002 .

[26]  B. Dundas,et al.  DIFFERENTIAL TOPOLOGY , 2002 .

[27]  H. Alt Lineare Funktionalanalysis : eine anwendungsorientierte Einführung , 2002 .

[28]  G. Belitskii,et al.  The Abel equation and total solvability of linear functional equations , 1998, Studia Mathematica.

[29]  J. Zukas Introduction to the Modern Theory of Dynamical Systems , 1998 .

[30]  L. Nirenberg,et al.  Mitio Nagumo Collected Papers , 1993 .

[31]  M. Zdun On the regular solutions of a linear functional equation , 1974 .

[32]  J. Cantwell Topological non-degenerate functions , 1968 .

[33]  William K. Holstein,et al.  The Mathematical Theory of Optimal Processes , 1965 .

[34]  A. Morse,et al.  The Behavior of a Function on Its Critical Set , 1939 .

[35]  É. Picard Mémoire sur la théorie des équations aux dérivées partielles et la méthode des approximations successives , 1890 .