Mathematical Methods for Supervised Learning

Let ρ be an unknown Borel measure defined on the space Z := X × Y with X ⊂ IR and Y = [−M,M ]. Given a set z of m samples zi = (xi, yi) drawn according to ρ, the problem of estimating a regression function fρ using these samples is considered. The main focus is to understand what is the rate of approximation, measured either in expectation or probability, that can be obtained under a given prior fρ ∈ Θ, i.e. under the assumption that fρ is in the set Θ, and what are possible algorithms for obtaining optimal or semi-optimal (up to logarithms) results. The optimal rate of decay in terms of m is established for many priors given either in terms of smoothness of fρ or its rate of approximation measured in one of several ways. This optimal rate is determined by two types of results. Upper bounds are established using various tools in approximation such as entropy, widths, and linear and nonlinear approximation. Lower bounds are proved using KullbackLeibler information together with Fano inequalities and a certain type of entropy. A distinction is drawn between algorithms which employ knowledge of the prior in the construction of the estimator and those that do not. Algorithms of the second type which are universally optimal for a certain range of priors are given.

[1]  A E Bostwick,et al.  THE THEORY OF PROBABILITIES. , 1896, Science.

[2]  A W Tucker,et al.  On Combinatorial Topology. , 1932, Proceedings of the National Academy of Sciences of the United States of America.

[3]  J. Cooper SINGULAR INTEGRALS AND DIFFERENTIABILITY PROPERTIES OF FUNCTIONS , 1973 .

[4]  B. Carl Entropy numbers, s-numbers, and eigenvalue problems , 1981 .

[5]  R. Z. Khasʹminskiĭ,et al.  Statistical estimation : asymptotic theory , 1981 .

[6]  Lucien Birgé Approximation dans les espaces métriques et théorie de l'estimation , 1983 .

[7]  V. Temlyakov Approximation by elements of a finite-dimensional subspace of functions from various sobolev or nikol'skii spaces , 1988 .

[8]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[9]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[10]  P. Massart,et al.  Rates of convergence for minimum contrast estimators , 1993 .

[11]  R. DeVore,et al.  BESOV SPACES ON DOMAINS IN Rd , 1993 .

[12]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[13]  I. Johnstone,et al.  Wavelet Shrinkage: Asymptopia? , 1995 .

[14]  M. Talagrand New concentration inequalities in product spaces , 1996 .

[15]  G. Lorentz,et al.  Constructive approximation : advanced problems , 1996 .

[16]  Martin Greiner,et al.  Wavelets , 2018, Complex..

[17]  Vladimir N. Temlyakov,et al.  The best m-term approximation and greedy algorithms , 1998, Adv. Comput. Math..

[18]  R. DeVore,et al.  Nonlinear approximation , 1998, Acta Numerica.

[19]  I. Daubechies,et al.  Tree Approximation and Encoding , 1999 .

[20]  Yuhong Yang,et al.  Information-theoretic determination of minimax rates of convergence , 1999 .

[21]  Przemysław Wojtaszczyk,et al.  Greedy Algorithm for General Biorthogonal Systems , 2000 .

[22]  R. DeVore,et al.  Restricted Nonlinear Approximation , 2000 .

[23]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[24]  Vladimir Temlyakov,et al.  Greedy Approximation with Regard to Bases and General Minimal Systems , 2002 .

[25]  S. Smale,et al.  ESTIMATING THE APPROXIMATION ERROR IN LEARNING THEORY , 2003 .

[26]  T. Poggio,et al.  The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.

[27]  Satoshi Kobayashi,et al.  Mathematical Foundations of Learning Theory , 2006, Recent Advances in Formal Languages and Applications.

[28]  Vladimir Temlyakov,et al.  The Entropy in Learning Theory. Error Estimates , 2007 .