Prediction, Learning, Uniform Convergence, and Scale-Sensitive Dimensions

We present a new general-purpose algorithm for learning classes of 0, 1-valued functions in a generalization of the prediction model and prove a general upper bound on the expected absolute error of this algorithm in terms of a scale-sensitive generalization of the Vapnik dimension proposed by Alon, Ben-David, Cesa-Bianchi, and Haussler. We give lower bounds implying that our upper bounds cannot be improved by more than a constant factor in general. We apply this result, together with techniques due to Haussler and to Benedek and Itai, to obtain new upper bounds on packing numbers in terms of this scale-sensitive notion of dimension. Using a different technique, we obtain new bounds on packing numbers in terms of Kearns and Schapire's fat-shattering function. We show how to apply both packing bounds to obtain improved general bounds on the sample complexity of agnostic learning. For each?>0, we establish weaker sufficient and stronger necessary conditions for a class of 0, 1-valued functions to be agnostically learnable to within?and to be an?-uniform Glivenko?Cantelli class.

[1]  J. Lamperti ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[2]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[3]  David Haussler,et al.  Predicting {0,1}-functions on randomly drawn points , 1988, COLT '88.

[4]  David Haussler,et al.  Equivalence of models for polynomial learnability , 1988, COLT '88.

[5]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[6]  John Shawe-Taylor,et al.  The learnability of formal concepts , 1990, COLT '90.

[7]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[8]  Alon Itai,et al.  Learnability with Respect to Fixed Distributions , 1991, Theor. Comput. Sci..

[9]  H. Balsters,et al.  Learnability with respect to fixed distributions , 1991 .

[10]  Panganamala Ramana Kumar,et al.  Learning stochastic functions by smooth simultaneous estimation , 1992, COLT '92.

[11]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[12]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[13]  Balas K. Natarajan,et al.  Occam's razor for functions , 1993, COLT '93.

[14]  Hans Ulrich Simon,et al.  Bounds on the Number of Examples Needed for Learning Functions , 1994, SIAM J. Comput..

[15]  Philip M. Long,et al.  Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[16]  David Haussler,et al.  Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[17]  Leonid Gurvits,et al.  Approximation and Learning of Convex Superpositions , 1995, J. Comput. Syst. Sci..

[18]  Leonid Gurvits,et al.  Approximation and Learning of Convex Superpositions , 1997, J. Comput. Syst. Sci..

[19]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[20]  Peter L. Bartlett,et al.  Function Learning from Interpolation , 1995, Combinatorics, Probability and Computing.