Margin-adaptive model selection in statistical learning

A classical condition for fast learning rates is the margin condition, first introduced by Mammen and Tsybakov. We tackle in this paper the problem of adaptivity to this condition in the context of model selection, in a general learning framework. Actually, we consider a weaker version of this condition that allows us to take into account that learning within a small model can be much easier than in a large one. Requiring this ``strong margin adaptivity'' makes the model selection problem more challenging. We first prove, in a very general framework, that some penalization procedures (including local Rademacher complexities) exhibit this adaptivity when the models are nested. Contrary to previous results, this holds with penalties that only depend on the data. Our second main result is that strong margin adaptivity is not always possible when the models are not nested: for every model selection procedure (even a randomized one), there is a problem for which it does not demonstrate strong margin adaptivity.

[1]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[2]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[3]  Luc Devroye,et al.  Lower bounds in pattern recognition and learning , 1995, Pattern Recognit..

[4]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[5]  P. Massart,et al.  Minimum contrast estimators on sieves: exponential bounds and rates of convergence , 1998 .

[6]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[7]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[8]  Gábor Lugosi,et al.  Pattern Classification and Learning Theory , 2002 .

[9]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[10]  Gilles Blanchard,et al.  On the Rate of Convergence of Regularized Boosting Classifiers , 2003, J. Mach. Learn. Res..

[11]  Peter L. Bartlett,et al.  Local Complexities for Empirical Risk Minimization , 2004, COLT.

[12]  G. Lugosi,et al.  Complexity regularization via localized random penalties , 2004, math/0410091.

[13]  Jean-Yves Audibert Classification under polynomial entropy and margin assump-tions and randomized estimators , 2004 .

[14]  S. Geer,et al.  Square root penalty: Adaptation to the margin in classification and in edge estimation , 2005, math/0507422.

[15]  G. Lecu'e Simultaneous adaptation to the margin and to complexity in classification , 2005, math/0509696.

[16]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[17]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[18]  P. Massart,et al.  Discussion: Local Rademacher complexities and oracle inequalities in risk minimization , 2006 .

[19]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[20]  V. Koltchinskii Rejoinder: Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0135.

[21]  Guillaume Lecué,et al.  Suboptimality of Penalized Empirical Risk Minimization in Classification , 2007, COLT.

[22]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[23]  P. Massart,et al.  Risk bounds for statistical learning , 2007, math/0702683.

[24]  A. Tsybakov,et al.  Fast learning rates for plug-in classifiers , 2007, 0708.2321.

[25]  Alexandre Tsybakov Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii , 2007 .

[26]  Trevor Hastie,et al.  Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 2008 .

[27]  Sylvain Arlot TECHNICAL APPENDIX TO "V -FOLD CROSS-VALIDATION IMPROVED: V -FOLD PENALIZATION , 2008, 0802.0566.

[28]  Sylvain Arlot Model selection by resampling penalization , 2007, 0906.3124.

[29]  Pascal Massart,et al.  Data-driven Calibration of Penalties for Least-Squares Regression , 2008, J. Mach. Learn. Res..