Fast learning rates for plug-in classifiers

It has been recently shown that, under the margin (or low noise) assumption, there exist classifiers attaining fast rates of convergence of the excess Bayes risk, that is, rates faster than n -1/2 . The work on this subject has suggested the following two conjectures: (i) the best achievable fast rate is of the order n -1 , and (ii) the plug-in classifiers generally converge more slowly than the classifiers based on empirical risk minimization. We show that both conjectures are not correct. In particular, we construct plug-in classifiers that can achieve not only fast, but also super-fast rates, that is, rates faster than n -1 . We establish minimax lower bounds showing that the obtained rates cannot be improved.

[1]  A. Kolmogorov,et al.  Entropy and "-capacity of sets in func-tional spaces , 1961 .

[2]  M. Birman,et al.  PIECEWISE-POLYNOMIAL APPROXIMATIONS OF FUNCTIONS OF THE CLASSES $ W_{p}^{\alpha}$ , 1967 .

[3]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[4]  H. Kushner,et al.  Rates of Convergence , 1978 .

[5]  C. J. Stone,et al.  Optimal Rates of Convergence for Nonparametric Estimators , 1980 .

[6]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[7]  A. Tsybakov,et al.  Minimax theory of image reconstruction , 1993 .

[8]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[9]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[10]  Yuhong Yang,et al.  Minimax Nonparametric Classification—Part I: Rates of Convergence , 1998 .

[11]  Yuhong Yang,et al.  Information-theoretic determination of minimax rates of convergence , 1999 .

[12]  Yuhong Yang,et al.  Minimax Nonparametric Classification — Part II : Model Selection for Adaptation , 1998 .

[13]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[14]  S. Geer Applications of empirical process theory , 2000 .

[15]  S. Geer Empirical Processes in M-Estimation , 2000 .

[16]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[17]  A. Tsybakov,et al.  Introduction à l'estimation non-paramétrique , 2003 .

[18]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[19]  Gilles Blanchard,et al.  On the Rate of Convergence of Regularized Boosting Classifiers , 2003, J. Mach. Learn. Res..

[20]  S. Geer,et al.  Adaptivity of Support Vector Machines with ` 1 Penalty , 2004 .

[21]  G. Lugosi,et al.  Complexity regularization via localized random penalties , 2004, math/0410091.

[22]  Jean-Yves Audibert Classification under polynomial entropy and margin assump-tions and randomized estimators , 2004 .

[23]  Ingo Steinwart,et al.  Fast Rates for Support Vector Machines , 2005, COLT.

[24]  A. Tsybakov,et al.  Fast learning rates for plug-in classifiers , 2005, 0708.2321.

[25]  S. Geer,et al.  Square root penalty: Adaptation to the margin in classification and in edge estimation , 2005, math/0507422.

[26]  Vladimir Koltchinskii,et al.  Exponential Convergence Rates in Classification , 2005, COLT.

[27]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[28]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[29]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[30]  Ingo Steinwart,et al.  Fast rates for support vector machines using Gaussian kernels , 2007, 0708.1838.

[31]  P. Massart,et al.  Risk bounds for statistical learning , 2007, math/0702683.

[32]  P. Massart,et al.  Statistical performance of support vector machines , 2008, 0804.0551.