Fast learning rates for plug-in classifiers

It has been recently shown that, under the margin (or low noise) assumption, there exist classiflers attaining fast rates of convergence of the excess Bayes risk, i.e., the rates faster than n i1=2 . The works on this subject suggested the following two conjectures: (i) the best achievable fast rate is of the order n i1 , and (ii) the plug-in classiflers generally converge slower than the classiflers based on empirical risk minimization. We show that both conjectures are not correct. In particular, we construct plug-in classiflers that can achieve not only the fast, but also the super-fast rates, i.e., the rates faster than n i1 . We establish minimax lower bounds showing that the obtained rates cannot be improved.

[1]  A. Kolmogorov,et al.  Entropy and "-capacity of sets in func-tional spaces , 1961 .

[2]  M. Birman,et al.  PIECEWISE-POLYNOMIAL APPROXIMATIONS OF FUNCTIONS OF THE CLASSES $ W_{p}^{\alpha}$ , 1967 .

[3]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[4]  C. J. Stone,et al.  Optimal Rates of Convergence for Nonparametric Estimators , 1980 .

[5]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[6]  A. Tsybakov,et al.  Minimax theory of image reconstruction , 1993 .

[7]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[8]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[9]  Yuhong Yang,et al.  Minimax Nonparametric Classification—Part I: Rates of Convergence , 1998 .

[10]  Yuhong Yang,et al.  Minimax Nonparametric Classification — Part II : Model Selection for Adaptation , 1998 .

[11]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[12]  S. Geer Empirical Processes in M-Estimation , 2000 .

[13]  S. R. Jammalamadaka,et al.  Empirical Processes in M-Estimation , 2001 .

[14]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[15]  A. Tsybakov,et al.  Introduction à l'estimation non-paramétrique , 2003 .

[16]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[17]  Gilles Blanchard,et al.  On the Rate of Convergence of Regularized Boosting Classifiers , 2003, J. Mach. Learn. Res..

[18]  S. Geer,et al.  Adaptivity of Support Vector Machines with ` 1 Penalty , 2004 .

[19]  G. Lugosi,et al.  Complexity regularization via localized random penalties , 2004, math/0410091.

[20]  Ingo Steinwart,et al.  Fast Rates for Support Vector Machines , 2005, COLT.

[21]  S. Geer,et al.  Square root penalty: Adaptation to the margin in classification and in edge estimation , 2005, math/0507422.

[22]  Vladimir Koltchinskii,et al.  Exponential Convergence Rates in Classification , 2005, COLT.

[23]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[24]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[25]  P. Massart,et al.  Risk bounds for statistical learning , 2007, math/0702683.