Agnostic Learning Nonconvex Function Classes

We consider the sample complexity of agnostic learning with respect to squared loss. It is known that if the function class F used for learning is convex then one can obtain better sample complexity bounds than usual. It has been claimed that there is a lower bound that showed there was an essential gap in the rate. In this paper we show that the lower bound proof has a gap in it. Although we do not provide a definitive answer to its validity. More positively, we show one can obtain "fast" sample complexity bounds for nonconvex F for "most" target conditional expectations. The new bounds depend on the detailed geometry of F, in particular the distance in a certain sense of the target's conditional expectation from the set of nonuniqueness points of the class F.

[1]  Peter L. Bartlett,et al.  Localized Rademacher Complexities , 2002, COLT.

[2]  E. Berger UNIFORM CENTRAL LIMIT THEOREMS (Cambridge Studies in Advanced Mathematics 63) By R. M. D UDLEY : 436pp., £55.00, ISBN 0-521-46102-2 (Cambridge University Press, 1999). , 2001 .

[3]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[4]  Shai Ben-David,et al.  Learning Distributions by Their Density Levels: A Paradigm for Learning without a Teacher , 1997, J. Comput. Syst. Sci..

[5]  R. Dudley,et al.  Uniform Central Limit Theorems: Notation Index , 2014 .

[6]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[7]  Peter L. Bartlett,et al.  The Importance of Convexity in Learning with Squared Loss , 1998, IEEE Trans. Inf. Theory.

[8]  Shahar Mendelson,et al.  Rademacher averages and phase transitions in Glivenko-Cantelli classes , 2002, IEEE Trans. Inf. Theory.

[9]  D. Braess Nonlinear Approximation Theory , 1986 .

[10]  Wee Sun Lee,et al.  Agnostic learning and single hidden layer neural networks , 1996 .

[11]  Peter L. Bartlett,et al.  Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.

[12]  L. P. Vlasov APPROXIMATIVE PROPERTIES OF SETS IN NORMED LINEAR SPACES , 1973 .

[13]  Shahar Mendelson,et al.  Improving the sample complexity using global data , 2002, IEEE Trans. Inf. Theory.

[14]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[15]  R. Schapire,et al.  Toward Efficient Agnostic Learning , 1994 .