论文信息 - On the mathematical foundations of learning

On the mathematical foundations of learning

(1) A main theme of this report is the relationship of approximation to learning and the primary role of sampling (inductive inference). We try to emphasize relations of the theory of learning to the mainstream of mathematics. In particular, there are large roles for probability theory, for algorithms such as least squares, and for tools and ideas from linear algebra and linear analysis. An advantage of doing this is that communication is facilitated and the power of core mathematics is more easily brought to bear. We illustrate what we mean by learning theory by giving some instances. (a) The understanding of language acquisition by children or the emergence of languages in early human cultures. (b) In Manufacturing Engineering, the design of a new wave of machines is anticipated which uses sensors to sample properties of objects before, during, and after treatment. The information gathered from these samples is to be analyzed by the machine to decide how to better deal with new input objects (see [43]). (c) Pattern recognition of objects ranging from handwritten letters of the alphabet to pictures of animals, to the human voice. Understanding the laws of learning plays a large role in disciplines such as (Cognitive) Psychology, Animal Behavior, Economic Decision Making, all branches of Engineering, Computer Science, and especially the study of human thought processes (how the brain works). Mathematics has already played a big role towards the goal of giving a universal foundation of studies in these disciplines. We mention as examples the theory of Neural Networks going back to McCulloch and Pitts [25] and Minsky and Papert [27], the PAC learning of Valiant [40], Statistical Learning Theory as developed by Vapnik [42], and the use of reproducing kernels as in [17] among many other mathematical developments. We are heavily indebted to these developments. Recent discussions with a number of mathematicians have also been helpful. In

Felipe Cucker | Steve Smale | S. Smale | F. Cucker

[1] I. J. Schoenberg. Metric spaces and completely monotone functions , 1938 .

[2] N. Aronszajn. Theory of Reproducing Kernels. , 1950 .

[3] A. Kolmogorov,et al. Entropy and "-capacity of sets in func-tional spaces , 1961 .

[4] J. Lamperti. ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[5] V. Hutson. Integral Equations , 1967, Nature.

[6] H. Osborn. The Morse index theorem , 1967 .

[7] M. Birman,et al. PIECEWISE-POLYNOMIAL APPROXIMATIONS OF FUNCTIONS OF THE CLASSES $ W_{p}^{\alpha}$ , 1967 .

[8] R. A. Silverman,et al. Introductory Real Analysis , 1972 .

[9] Jean Duchon,et al. Splines minimizing rotation-invariant semi-norms in Sobolev spaces , 1976, Constructive Theory of Functions of Several Variables.

[10] S. Lang. Complex Analysis , 1977 .

[11] Peter Craven,et al. Smoothing noisy data with spline functions , 1978 .

[12] J. Meinguet. Multivariate interpolation at arbitrary points made simple , 1979 .

[13] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[14] A. Pinkus. n-Widths in Approximation Theory , 1985 .

[15] S. Yau,et al. On the parabolic kernel of the Schrödinger operator , 1986 .

[16] A. Pietsch. Eigenvalues and S-Numbers , 1987 .

[17] W S McCulloch,et al. A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[18] W. Pitts,et al. A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[19] J. Herod. Introduction to Hilbert spaces with applications , 1990 .

[20] B. Carl,et al. Entropy, Compactness and the Approximation of Operators , 1990 .

[21] G. Wahba. Spline models for observational data , 1990 .

[22] Andrew R. Barron,et al. Complexity Regularization with Application to Artificial Neural Networks , 1991 .

[23] David Haussler,et al. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[24] George G. Lorentz,et al. Constructive Approximation , 1993, Grundlehren der mathematischen Wissenschaften.

[25] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[26] M. Shubin. Partial Differential Equations VII : Spectral Theory of Differential Operators , 1994 .

[27] A. Magnus. Constructive Approximation, Grundlehren der mathematischen Wissenschaften, Vol. 303, R. A. DeVore and G. G. Lorentz, Springer-Verlag, 1993, x + 449 pp. , 1994 .

[28] J. Navarro-Pedreño. Numerical Methods for Least Squares Problems , 1996 .

[29] Peter L. Bartlett,et al. The importance of convexity in learning with squared loss , 1998, COLT '96.

[30] Michael Taylor,et al. Partial Differential Equations I: Basic Theory , 1996 .

[31] Michael E. Taylor,et al. Partial Differential Equations , 1996 .

[32] Åke Björck,et al. Numerical methods for least square problems , 1996 .

[33] G. Lorentz,et al. Constructive approximation : advanced problems , 1996 .

[34] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[35] H. Triebel,et al. Function Spaces, Entropy Numbers, Differential Operators: References , 1996 .

[36] C. Darken,et al. Constructive Approximation Rates of Convex Approximation in Non-hilbert Spaces , 2022 .

[37] S. Smale. Mathematical problems for the next century , 1998 .

[38] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[39] Partha Niyogi,et al. The Informational Complexity of Learning , 1998, Springer US.

[40] Lenore Blum,et al. Complexity and Real Computation , 1997, Springer New York.

[41] Tomaso A. Poggio,et al. Machine Learning, Machine Vision, and the Brain , 1999, AI Mag..

[42] Michael Shub,et al. Newton's method for overdetermined systems of equations , 2000, Math. Comput..

[43] S. Geer. Empirical Processes in M-Estimation , 2000 .

[44] Tomaso A. Poggio,et al. Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[45] S. R. Jammalamadaka,et al. Empirical Processes in M-Estimation , 2001 .

[46] Bernhard Schölkopf,et al. Generalization Performance of Regularization Networks and Support Vector Machines via Entropy Numbers of Compact Operators , 1998 .

[47] S. Smale,et al. ESTIMATING THE APPROXIMATION ERROR IN LEARNING THEORY , 2003 .

[48] Aaas News,et al. Book Reviews , 1893, Buffalo Medical and Surgical Journal.