On the mathematical foundations of learning

(1) A main theme of this report is the relationship of approximation to learning and the primary role of sampling (inductive inference). We try to emphasize relations of the theory of learning to the mainstream of mathematics. In particular, there are large roles for probability theory, for algorithms such as least squares, and for tools and ideas from linear algebra and linear analysis. An advantage of doing this is that communication is facilitated and the power of core mathematics is more easily brought to bear. We illustrate what we mean by learning theory by giving some instances. (a) The understanding of language acquisition by children or the emergence of languages in early human cultures. (b) In Manufacturing Engineering, the design of a new wave of machines is anticipated which uses sensors to sample properties of objects before, during, and after treatment. The information gathered from these samples is to be analyzed by the machine to decide how to better deal with new input objects (see [43]). (c) Pattern recognition of objects ranging from handwritten letters of the alphabet to pictures of animals, to the human voice. Understanding the laws of learning plays a large role in disciplines such as (Cognitive) Psychology, Animal Behavior, Economic Decision Making, all branches of Engineering, Computer Science, and especially the study of human thought processes (how the brain works). Mathematics has already played a big role towards the goal of giving a universal foundation of studies in these disciplines. We mention as examples the theory of Neural Networks going back to McCulloch and Pitts [25] and Minsky and Papert [27], the PAC learning of Valiant [40], Statistical Learning Theory as developed by Vapnik [42], and the use of reproducing kernels as in [17] among many other mathematical developments. We are heavily indebted to these developments. Recent discussions with a number of mathematicians have also been helpful. In

[1]  I. J. Schoenberg Metric spaces and completely monotone functions , 1938 .

[2]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[3]  A. Kolmogorov,et al.  Entropy and "-capacity of sets in func-tional spaces , 1961 .

[4]  J. Lamperti ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[5]  V. Hutson Integral Equations , 1967, Nature.

[6]  H. Osborn The Morse index theorem , 1967 .

[7]  M. Birman,et al.  PIECEWISE-POLYNOMIAL APPROXIMATIONS OF FUNCTIONS OF THE CLASSES $ W_{p}^{\alpha}$ , 1967 .

[8]  R. A. Silverman,et al.  Introductory Real Analysis , 1972 .

[9]  Jean Duchon,et al.  Splines minimizing rotation-invariant semi-norms in Sobolev spaces , 1976, Constructive Theory of Functions of Several Variables.

[10]  S. Lang Complex Analysis , 1977 .

[11]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[12]  J. Meinguet Multivariate interpolation at arbitrary points made simple , 1979 .

[13]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[14]  A. Pinkus n-Widths in Approximation Theory , 1985 .

[15]  S. Yau,et al.  On the parabolic kernel of the Schrödinger operator , 1986 .

[16]  A. Pietsch Eigenvalues and S-Numbers , 1987 .

[17]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[18]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[19]  J. Herod Introduction to Hilbert spaces with applications , 1990 .

[20]  B. Carl,et al.  Entropy, Compactness and the Approximation of Operators , 1990 .

[21]  G. Wahba Spline models for observational data , 1990 .

[22]  Andrew R. Barron,et al.  Complexity Regularization with Application to Artificial Neural Networks , 1991 .

[23]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[24]  George G. Lorentz,et al.  Constructive Approximation , 1993, Grundlehren der mathematischen Wissenschaften.

[25]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[26]  M. Shubin Partial Differential Equations VII : Spectral Theory of Differential Operators , 1994 .

[27]  A. Magnus Constructive Approximation, Grundlehren der mathematischen Wissenschaften, Vol. 303, R. A. DeVore and G. G. Lorentz, Springer-Verlag, 1993, x + 449 pp. , 1994 .

[28]  J. Navarro-Pedreño Numerical Methods for Least Squares Problems , 1996 .

[29]  Peter L. Bartlett,et al.  The importance of convexity in learning with squared loss , 1998, COLT '96.

[30]  Michael Taylor,et al.  Partial Differential Equations I: Basic Theory , 1996 .

[31]  Michael E. Taylor,et al.  Partial Differential Equations , 1996 .

[32]  Åke Björck,et al.  Numerical methods for least square problems , 1996 .

[33]  G. Lorentz,et al.  Constructive approximation : advanced problems , 1996 .

[34]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[35]  H. Triebel,et al.  Function Spaces, Entropy Numbers, Differential Operators: References , 1996 .

[36]  C. Darken,et al.  Constructive Approximation Rates of Convex Approximation in Non-hilbert Spaces , 2022 .

[37]  S. Smale Mathematical problems for the next century , 1998 .

[38]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[39]  Partha Niyogi,et al.  The Informational Complexity of Learning , 1998, Springer US.

[40]  Lenore Blum,et al.  Complexity and Real Computation , 1997, Springer New York.

[41]  Tomaso A. Poggio,et al.  Machine Learning, Machine Vision, and the Brain , 1999, AI Mag..

[42]  Michael Shub,et al.  Newton's method for overdetermined systems of equations , 2000, Math. Comput..

[43]  S. Geer Empirical Processes in M-Estimation , 2000 .

[44]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[45]  S. R. Jammalamadaka,et al.  Empirical Processes in M-Estimation , 2001 .

[46]  Bernhard Schölkopf,et al.  Generalization Performance of Regularization Networks and Support Vector Machines via Entropy Numbers of Compact Operators , 1998 .

[47]  S. Smale,et al.  ESTIMATING THE APPROXIMATION ERROR IN LEARNING THEORY , 2003 .

[48]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.