Approximation in Learning Theory

This paper addresses some problems of supervised learning in the setting formulated by Cucker and Smale. Supervised learning, or learning-from-examples, refers to a process that builds on the base of available data of inputs xi and outputs yi, i = 1,...,m, a function that best represents the relation between the inputs x ∈ X and the corresponding outputs y ∈ Y. The goal is to find an estimator fz on the base of given data z := ((x1,y1),...,(xm,ym)) that approximates well the regression function fρ (or its projection) of an unknown Borel probability measure ρ defined on Z = X × Y. We assume that (xi,yi), i = 1,...,m, are independent and distributed according to ρ. We discuss the following two problems: I. the projection learning problem (improper function learning problem); II. universal (adaptive) estimators in the proper function learning problem. In the first problem we do not impose any restrictions on a Borel measure ρ except our standard assumption that |y|≤ M a.e. with respect to ρ. In this case we use the data z to estimate (approximate) the L2(ρX) projection (fρ)W of fρ onto a function class W of our choice. Here, ρX is the marginal probability measure. In [KT1,2] this problem has been studied for W satisfying the decay condition εn(W,B) ≤ Dn-r of the entropy numbers εn(W,B) of W in a Banach space B in the case B = C(X) or B = L2(\rhoX). In this paper we obtain the upper estimates in the case εn(W,L1(ρX)) ≤ Dn-r with an extra assumption that W is convex. In the second problem we assume that an unknown measure ρ satisfies some conditions. Following the standard way from nonparametric statistics we formulate these conditions of the form fρ ∈ Θ. Next, we assume that the only a priori information available is that fρ belongs to a class Θ (unknown) from a known collection {Θ} of classes. We want to build an estimator that provides approximation of fρ close to the optimal for the class Θ. Along with standard penalized least squares estimators we consider a new method of construction of universal estimators. This method is based on a combination of two powerful ideas in building universal estimators. The first one is the use of penalized least squares estimators. This idea works well in the case of general setting with rather abstract methods of approximation. The second one is the idea of thresholding that works very well when we use wavelets expansions as an approximation tool. A new estimator that we call the big jump estimator uses the least squares estimators and chooses a right model by a thresholding criteria instead of the penalization. In this paper we illustrate how ideas and methods of approximation theory can be used in learning theory both in formulating a problem and in solving it.

[1]  B. Carl Entropy numbers, s-numbers, and eigenvalue problems , 1981 .

[2]  V. Temlyakov Approximation by elements of a finite-dimensional subspace of functions from various sobolev or nikol'skii spaces , 1988 .

[3]  G. Pisier The volume of convex bodies and Banach space geometry , 1989 .

[4]  V. N. Temli︠a︡kov Approximation of periodic functions , 1993 .

[5]  Vladimir Temlyakov Nonlinear Kolmogorov widths , 1998 .

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  Peter L. Bartlett,et al.  The Importance of Convexity in Learning with Squared Loss , 1998, IEEE Trans. Inf. Theory.

[8]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[9]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[10]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[11]  Gábor Lugosi,et al.  Pattern Classification and Learning Theory , 2002 .

[12]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[13]  V. Temlyakov Nonlinear Methods of Approximation , 2003, Found. Comput. Math..

[14]  Wolfgang Dahmen,et al.  Universal Algorithms for Learning Theory Part I : Piecewise Constant Functions , 2005, J. Mach. Learn. Res..

[15]  T. Poggio,et al.  The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.

[16]  Vladimir Temlyakov,et al.  Optimal estimators in learning theory , 2006 .