论文信息 - The importance of convexity in learning with squared loss

The importance of convexity in learning with squared loss

We show that if the closure of a function class under the metric induced by some probability distribution is not convex, then the sample complexity for agnostically learning with squared loss (using only hypotheses in )i s where is the probability of success and is the required accuracy. In comparison, if the class is convex and has finite pseudodimension, then the sample complexity is . If a nonconvex class has finite pseudodimension, then the sample complexity for agnostically learning the closure of the convex hull of ,i s . Hence, for agnostic learning, learning the convex hull provides better approximation capabilities with little sample complexity penalty.

Peter L. Bartlett | Wee Sun Lee | Robert C. Williamson | R. C. Williamson | P. Bartlett

[1] J. Lamperti. ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[2] R. A. Silverman,et al. Introductory Real Analysis , 1972 .

[3] Keinosuke Fukunaga,et al. Introduction to Statistical Pattern Recognition , 1972 .

[4] D. Braess. Nonlinear Approximation Theory , 1986 .

[5] Keinosuke Fukunaga,et al. Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[6] H. Balsters,et al. Learnability with respect to fixed distributions , 1991 .

[7] Martin Anthony,et al. Computational learning theory: an introduction , 1992 .

[8] L. Jones. A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[9] R. Schapire. Toward Eecient Agnostic Learning , 1992 .

[10] David Haussler,et al. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[11] John Shawe-Taylor,et al. Bounding Sample Size with the Vapnik-Chervonenkis Dimension , 1993, Discrete Applied Mathematics.

[12] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[13] Daniel F. McCaffrey,et al. Convergence rates for single hidden layer feedforward networks , 1994, Neural Networks.

[14] Robert E. Schapire,et al. Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[15] Philip M. Long,et al. A Generalization of Sauer's Lemma , 1995, J. Comb. Theory A.

[16] Peter L. Bartlett,et al. On efficient agnostic learning of linear combinations of basis functions , 1995, COLT '95.

[17] Wolfgang Maass,et al. Agnostic PAC Learning of Functions on Analog Neural Nets , 1993, Neural Computation.

[18] D. Pollard. Uniform ratio limit theorems for empirical processes , 1995 .

[19] Y. Makovoz. Random Approximants and Neural Networks , 1996 .

[20] Peter L. Bartlett,et al. Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.

[21] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[22] Sanjeev R. Kulkarni,et al. Covering numbers for real-valued function classes , 1997, IEEE Trans. Inf. Theory.

[23] Shai Ben-David,et al. Learning Distributions by Their Density Levels: A Paradigm for Learning without a Teacher , 1997, J. Comput. Syst. Sci..

[24] Peter L. Bartlett,et al. The Importance of Convexity in Learning with Squared Loss , 1998, IEEE Trans. Inf. Theory.