Approximation theory of the MLP model in neural networks

In this survey we discuss various approximation-theoretic problems that arise in the multilayer feedforward perceptron (MLP) model in neural networks. The MLP model is one of the more popular and practical of the many neural network models. Mathematically it is also one of the simpler models. Nonetheless the mathematics of this model is not well understood, and many of these problems are approximation-theoretic in character. Most of the research we will discuss is of very recent vintage. We will report on what has been done and on various unanswered questions. We will not be presenting practical (algorithmic) methods. We will, however, be exploring the capabilities and limitations of this model.

[1]  L. Schwartz Sur certaines familles non fondamentales de fonctions continues , 1944 .

[2]  L. Schwartz Theorie Generale des Fonctions Moyenne-Periodiques , 1947 .

[3]  E. Corominas,et al.  Condiciones para que una función infinitamente derivable sea un polinomio , 1954 .

[4]  J. Kahane Lectures on mean periodic functions , 1959 .

[5]  R. E. Edwards,et al.  Functional Analysis: Theory and Applications , 1965 .

[6]  U. Neri Distributions and Fourier transforms , 1971 .

[7]  G. Pisier Remarques sur un résultat non publié de B. Maurey , 1981 .

[8]  R. Hecht-Nielsen Kolmogorov''s Mapping Neural Network Existence Theorem , 1987 .

[9]  Robert M. Farber,et al.  How Neural Nets Work , 1987, NIPS.

[10]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[11]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[12]  Eric B. Baum,et al.  On the capabilities of multilayer perceptrons , 1988, J. Complex..

[13]  H. White,et al.  There exists a neural network that does not make avoidable mistakes , 1988, IEEE 1988 International Conference on Neural Networks.

[14]  B. Irie,et al.  Capabilities of three-layered perceptrons , 1988, IEEE 1988 International Conference on Neural Networks.

[15]  Ord,et al.  Approximate representation of functions of several variables in terms of functions of one variable. , 1989, Physical review letters.

[16]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[17]  Tomaso A. Poggio,et al.  Representation Properties of Networks: Kolmogorov's Theorem Is Irrelevant , 1989, Neural Computation.

[18]  R. DeVore,et al.  Optimal nonlinear approximation , 1989 .

[19]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[20]  S. M. Carroll,et al.  Construction of neural nets using the radon transform , 1989, International 1989 Joint Conference on Neural Networks.

[21]  H. White,et al.  Universal approximation using feedforward networks with non-sigmoid hidden layer activation functions , 1989, International 1989 Joint Conference on Neural Networks.

[22]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[23]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[24]  Kurt Hornik,et al.  Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[25]  Neil E. Cotter,et al.  The Stone-Weierstrass theorem and its application to neural networks , 1990, IEEE Trans. Neural Networks.

[26]  Halbert White,et al.  Approximating and learning unknown mappings using multilayer feedforward networks with bounded weights , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[27]  L. Jones Constructive approximations for neural networks by sigmoidal functions , 1990, Proc. IEEE.

[28]  Vladik Kreinovich,et al.  Arbitrary nonlinearity is sufficient to represent all functions by neural networks: A theorem , 1991, Neural Networks.

[29]  V. Kůrková Kolmogorov's Theorem Is Relevant , 1991, Neural Comput..

[30]  Yoshifusa Ito,et al.  Representation of functions by superpositions of a step or sigmoid function and their applications to neural network theory , 1991, Neural Networks.

[31]  Yoshifusa Ito,et al.  Approximation of functions on a compact set by finite sums of a sigmoid function without scaling , 1991, Neural Networks.

[32]  A. Barron Approximation and Estimation Bounds for Artificial Neural Networks , 1991, COLT '91.

[33]  Edward K. Blum,et al.  Approximation theory and feedforward networks , 1991, Neural Networks.

[34]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[35]  Panos J. Antsaklis,et al.  A simple method to derive bounds on the size and to train multilayer neural networks , 1991, IEEE Trans. Neural Networks.

[36]  Vra Krkov Kolmogorov's Theorem Is Relevant , 1991, Neural Computation.

[37]  Yih-Fang Huang,et al.  Bounds on the number of hidden neurons in multilayer perceptrons , 1991, IEEE Trans. Neural Networks.

[38]  A. Morris,et al.  Multilayer feedforward neural networks : a canonical form approximation of nonlinearity , 1992 .

[39]  Halbert White,et al.  On learning the derivatives of an unknown mapping with multilayer feedforward networks , 1992, Neural Networks.

[40]  Yoshifusa Ito,et al.  Approximation of continuous functions on Rd by linear combinations of shifted rotations of a sigmoid function with and without scaling , 1992, Neural Networks.

[41]  L. Jones A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[42]  Vera Kurková,et al.  Kolmogorov's theorem and multilayer neural networks , 1992, Neural Networks.

[43]  Shlomo Geva,et al.  A constructive method for multivariate function approximation by multilayer perceptrons , 1992, IEEE Trans. Neural Networks.

[44]  Eduardo D. Sontag,et al.  Feedforward Nets for Interpolation and Classification , 1992, J. Comput. Syst. Sci..

[45]  C. Chui,et al.  Approximation by ridge functions and neural networks with one hidden layer , 1992 .

[46]  Pierre Cardaliaguet,et al.  Approximation of a function and its derivative with a neural network , 1992, Neural Networks.

[47]  Héctor J. Sussmann,et al.  Uniqueness of the weights for minimal feedforward nets with a given input-output map , 1992, Neural Networks.

[48]  C. Micchelli,et al.  Approximation by superposition of sigmoidal and radial basis functions , 1992 .

[49]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[50]  Charles A. Micchelli,et al.  How to Choose an Activation Function , 1993, NIPS.

[51]  Hong Chen,et al.  Approximations of continuous functionals by neural networks with application to dynamic systems , 1993, IEEE Trans. Neural Networks.

[52]  A. Pinkus,et al.  Fundamentality of Ridge Functions , 1993 .

[53]  Eduardo D. Sontag,et al.  UNIQUENESS OF WEIGHTS FOR NEURAL NETWORKS , 1993 .

[54]  Kurt Hornik,et al.  Some new results on neural network approximation , 1993, Neural Networks.

[55]  David A. Sprecher,et al.  A universal mapping for kolmogorov's superposition theorem , 1993, Neural Networks.

[56]  W. Light Ridge Functions, Sigmoidal Functions and Neural Networks , 1993 .

[57]  Hrushikesh Narhar Mhaskar,et al.  Approximation properties of a multilayered feedforward artificial neural network , 1993, Adv. Comput. Math..

[58]  Xin Li,et al.  Realization of Neural Networks with One Hidden Layer , 1993 .

[59]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[60]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[61]  Jacques de Villiers,et al.  Backpropagation neural nets with one and two hidden layers , 1993, IEEE Trans. Neural Networks.

[62]  Rolf Unbehauen,et al.  On the Realization of a Kolmogorov Network , 1993, Neural Computation.

[63]  Yoshikane Takahashi,et al.  Generalization and Approximation Capabilities of Multilayer Networks , 1993, Neural Computation.

[64]  Brian D. Ripley,et al.  Neural Networks and Related Methods for Classification , 1994 .

[65]  Kurt Hornik,et al.  Degree of Approximation Results for Feedforward Networks Approximating Unknown Mappings and Their Derivatives , 1994, Neural Computation.

[66]  L. K. Jones,et al.  Good weights and hyperbolic kernels for neural networks, projection pursuit, and pattern classification: Fourier strategies for extracting information from high-dimensional data , 1994, IEEE Trans. Inf. Theory.

[67]  W. Dahmen,et al.  Approximation theory VII , 1994 .

[68]  Hidefumi Katsuura,et al.  Computational aspects of Kolmogorov's superposition theorem , 1994, Neural Networks.

[69]  Richard J. Mammone,et al.  Artificial neural networks for speech and vision , 1994 .

[70]  Chong-Ho Choi,et al.  Constructive neural networks with piecewise interpolation capabilities for function approximations , 1994, IEEE Trans. Neural Networks.

[71]  Bobby G. Sumpter,et al.  Theory and Applications of Neural Computing in Chemical Science , 1994 .

[72]  Yoshifusa Ito Differentiable approximation by means of the Radon transformation and its applications to neural networks , 1994 .

[73]  Yoshifusa Ito,et al.  Approximation Capability of Layered Neural Networks with Sigmoid Units on Two Layers , 1994, Neural Computation.

[74]  H. Mhaskar,et al.  Neural networks for localized approximation , 1994 .

[75]  S. W. Ellacott,et al.  Aspects of the numerical analysis of neural networks , 1994, Acta Numerica.

[76]  Charles A. Micchelli,et al.  Dimension-independent bounds on the degree of approximation by neural networks , 1994, IBM J. Res. Dev..

[77]  Paul C. Kainen,et al.  Functionally Equivalent Feedforward Neural Networks , 1994, Neural Computation.

[78]  C. Fefferman Reconstructing a neural net from its output , 1994 .

[79]  M. Nees Approximative versions of Kolmogorov's superposition theorem, proved constructively , 1994 .

[80]  Thomas Kailath,et al.  Rational approximation techniques for analysis of neural networks , 1994, IEEE Trans. Inf. Theory.

[81]  Uwe Helmke,et al.  Existence and uniqueness results for neural network approximations , 1995, IEEE Trans. Neural Networks.

[82]  Hong Chen,et al.  Approximation capability in C(R¯n) by multilayer feedforward networks and related problems , 1995, IEEE Trans. Neural Networks.

[83]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[84]  Maxwell B. Stinchcombe,et al.  Precision and Approximate Flatness in Artificial Neural Networks , 1995, Neural Computation.

[85]  C. Micchelli,et al.  Degree of Approximation by Neural and Translation Networks with a Single Hidden Layer , 1995 .

[86]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[87]  Vera Kurková,et al.  Approximation of functions by perceptron networks with bounded number of hidden units , 1995, Neural Networks.

[88]  Gary G. R. Green,et al.  Neural networks, approximation theory, and finite precision computation , 1995, Neural Networks.

[89]  Halbert White,et al.  Sup-norm approximation bounds for networks through probabilistic methods , 1995, IEEE Trans. Inf. Theory.

[90]  Hong Chen,et al.  Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems , 1995, IEEE Trans. Neural Networks.

[91]  T. Draelos,et al.  A constructive neural network algorithm for function approximation , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[92]  Y. Makovoz Random Approximants and Neural Networks , 1996 .

[93]  H. N. Mhaskar,et al.  Neural Networks for Optimal Approximation of Smooth and Analytic Functions , 1996, Neural Computation.

[94]  S. Ellacott,et al.  Neural networks : deterministic methods of analysis , 1996 .

[95]  Manuela Nees,et al.  Chebyshev approximation by discrete superposition. Application to neural networks , 1996, Adv. Comput. Math..

[96]  Robert I. Damper,et al.  Comparison of multilayer and radial basis function neural networks for text-dependent speaker recognition , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[97]  David A. Sprecher,et al.  A Numerical Implementation of Kolmogorov's Superpositions , 1996, Neural Networks.

[98]  Arne Frick Upper Bounds on the Number of Hidden Nodes in Sugiyama's Algorithm , 1996, Graph Drawing.

[99]  Moshe Shoham,et al.  Approximating Functions by Neural Networks: A Constructive Solution in the Uniform Norm , 1996, Neural Networks.

[100]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[101]  Which classes of functions can a given multilayer perceptron approximate? , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[102]  G. Lorentz,et al.  Constructive approximation : advanced problems , 1996 .

[103]  Xin Li,et al.  Limitations of the approximation capabilities of neural networks with one hidden layer , 1996, Adv. Comput. Math..

[104]  Yoshifusa Ito,et al.  Nonlinearity creates linear independence , 1996, Adv. Comput. Math..

[105]  A. Pinkus TDI-Subspaces ofC(Rd) and Some Density Problems from Neural Networks , 1996 .

[106]  Xin Li,et al.  Simultaneous approximations of multivariate functions and their derivatives by neural networks with one hidden layer , 1996, Neurocomputing.

[107]  Sumio Watanabe,et al.  Solvable models of layered neural networks based on their differential structure , 1996, Adv. Comput. Math..

[108]  C. Darken,et al.  Constructive Approximation Rates of Convex Approximation in Non-hilbert Spaces , 2022 .

[109]  Vladik Kreinovich,et al.  Estimates of the Number of Hidden Units and Variation with Respect to Half-Spaces , 1997, Neural Networks.

[110]  Hrushikesh Narhar Mhaskar,et al.  Neural Networks for Functional Approximation and System Identification , 1997, Neural Computation.

[111]  Gilles Pagès,et al.  Approximations of Functions by a Multilayer Perceptron: a New Approach , 1997, Neural Networks.

[112]  L. K. Jones,et al.  The computational intractability of training sigmoidal neural networks , 1997, IEEE Trans. Inf. Theory.

[113]  L. Schumaker,et al.  Surface Fitting and Multiresolution Methods , 1997 .

[114]  H. Mhaskar,et al.  On a choice of sampling nodes for optimal approximation of smooth functions by generalized translation networks , 1997 .

[115]  David A. Sprecher,et al.  A Numerical Implementation of Kolmogorov's Superpositions II , 1996, Neural Networks.

[116]  Ah Chung Tsoi,et al.  Universal Approximation Using Feedforward Neural Networks: A Survey of Some Existing Methods, and Some New Results , 1998, Neural Networks.

[117]  Guang-Bin Huang,et al.  Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions , 1998, IEEE Trans. Neural Networks.

[118]  Y. Makovoz Uniform Approximation by Neural Networks , 1998 .

[119]  Robert M. Burton,et al.  Universal approximation in p-mean by neural networks , 1998, Neural Networks.

[120]  Peter L. Bartlett,et al.  Almost Linear VC-Dimension Bounds for Piecewise Polynomial Networks , 1998, Neural Computation.

[121]  Halbert White,et al.  Improved Rates and Asymptotic Normality for Nonparametric Neural Network Estimators , 1999, IEEE Trans. Inf. Theory.

[122]  V. Maiorov On Best Approximation by Ridge Functions , 1999 .

[123]  P. Petrushev Approximation by ridge functions and neural networks , 1999 .

[124]  Allan Pinkus,et al.  Lower bounds for approximation by MLP neural networks , 1999, Neurocomputing.

[125]  R. Meir,et al.  On the Approximation of Functional Classes Equipped with a Uniform Measure Using Ridge Functions , 1999 .

[126]  Paul C. Kainen,et al.  Approximation by neural networks is not continuous , 1999, Neurocomputing.

[127]  A. Pinkus,et al.  Identifying Linear Combinations of Ridge Functions , 1999 .

[128]  Ron Meir,et al.  On the near optimality of the stochastic approximation of smooth functions by neural networks , 2000, Adv. Comput. Math..

[129]  L. Jones Local greedy approximation for nonlinear regression and neural network training , 2000 .

[130]  Xin Li Simultaneous approximations of multivariate functions and their by neural networks with XinLi * derivatives one hidden layer , 2022 .