On Universal Approximation by Neural Networks with Uniform Guarantees on Approximation of Infinite Dimensional Maps

The study of universal approximation of arbitrary functions $f: \mathcal{X} \to \mathcal{Y}$ by neural networks has a rich and thorough history dating back to Kolmogorov (1957). In the case of learning finite dimensional maps, many authors have shown various forms of the universality of both fixed depth and fixed width neural networks. However, in many cases, these classical results fail to extend to the recent use of approximations of neural networks with infinitely many units for functional data analysis, dynamical systems identification, and other applications where either $\mathcal{X}$ or $\mathcal{Y}$ become infinite dimensional. Two questions naturally arise: which infinite dimensional analogues of neural networks are sufficient to approximate any map $f: \mathcal{X} \to \mathcal{Y}$, and when do the finite approximations to these analogues used in practice approximate $f$ uniformly over its infinite dimensional domain $\mathcal{X}$? In this paper, we answer the open question of universal approximation of nonlinear operators when $\mathcal{X}$ and $\mathcal{Y}$ are both infinite dimensional. We show that for a large class of different infinite analogues of neural networks, any continuous map can be approximated arbitrarily closely with some mild topological conditions on $\mathcal{X}$. Additionally, we provide the first lower-bound on the minimal number of input and output units required by a finite approximation to an infinite neural network to guarantee that it can uniformly approximate any nonlinear operator using samples from its inputs and outputs.

[1]  Matthias W. Seeger,et al.  Gaussian Processes For Machine Learning , 2004, Int. J. Neural Syst..

[2]  Liwei Wang,et al.  The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.

[3]  Roi Livni,et al.  Learning Infinite-Layer Networks: Beyond the Kernel Trick , 2016, ArXiv.

[4]  Jonas Adler,et al.  Solving ill-posed inverse problems using iterative deep neural networks , 2017, ArXiv.

[5]  Maxwell B. Stinchcombe,et al.  Neural network approximation of continuous functionals and continuous functions on compactifications , 1999, Neural Networks.

[6]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[7]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[8]  Tommi S. Jaakkola,et al.  Steps Toward Deep Kernel Methods from Infinite Neural Networks , 2015, ArXiv.

[9]  Philippe C. Besse,et al.  Autoregressive Forecasting of Some Functional Climatic Variations , 2000 .

[10]  Yee Whye Teh,et al.  Neural Processes , 2018, ArXiv.

[11]  Christopher K. I. Williams Computation with Infinite Neural Networks , 1998, Neural Computation.

[12]  Lawrence K. Saul,et al.  Analysis and Extension of Arc-Cosine Kernels for Large Margin Classification , 2011, ArXiv.

[13]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[14]  B. Silverman,et al.  Functional Data Analysis , 1997 .

[15]  Xiaolei Qianz The Expressive Power Of , 1994 .

[16]  Hong Chen,et al.  Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems , 1995, IEEE Trans. Neural Networks.

[17]  Roi Livni,et al.  Learning Infinite Layer Networks Without the Kernel Trick , 2017, ICML.

[18]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[19]  R. S-A. Gatsaeva,et al.  On the representation of continuous functions of several variables as superpositions of continuous functions of one variable and addition , 2018 .