Complexity regularization via localized random penalties

In this article, model selection via penalized empirical loss minimization in nonparametric classification problems is studied. Datadependent penalties are constructed, which are based on estimates of the complexity of a small subclass of each model class, containing only those functions with small empirical loss. The penalties are novel since those considered in the literature are typically based on the entire model class. Oracle inequalities using these penalties are established, and the advantage of the new penalties over those based on the complexity of the whole model class is demonstrated. 1. Introduction. In this article, we propose a new complexity-penalized model selection method based on data-dependent penalties. We consider the binary classification problem where, given a random observation X ∈ R d , one has to predict Y ∈ {0,1}. A classifier or classification rule is a function f : R d → {0,1}, with loss

[1]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[2]  E. Giné,et al.  Some Limit Theorems for Empirical Processes , 1984 .

[3]  John Shawe-Taylor,et al.  A Result of Vapnik with Applications , 1993, Discret. Appl. Math..

[4]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[5]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[6]  M. Talagrand A new look at independence , 1996 .

[7]  M. Ledoux On Talagrand's deviation inequalities for product measures , 1997 .

[8]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[9]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[10]  G. Lugosi,et al.  Adaptive Model Selection Using Empirical Complexities , 1998 .

[11]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[12]  V. Koltchinskii,et al.  Rademacher Processes and Bounding the Risk of Function Learning , 2004, math/0405338.

[13]  Fernando Lozano,et al.  Model selection using Rademacher Penalization , 2000 .

[14]  P. Massart Some applications of concentration inequalities to statistics , 2000 .

[15]  P. Massart,et al.  About the constants in Talagrand's concentration inequalities for empirical processes , 2000 .

[16]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[17]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[18]  E. Rio,et al.  Inégalités de concentration pour les processus empiriques de classes de parties , 2001 .

[19]  Dmitry Panchenko,et al.  Some Local Measures of Complexity of Convex Hulls and Generalization Bounds , 2002, COLT.

[20]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[21]  Gábor Lugosi,et al.  Pattern Classification and Learning Theory , 2002 .

[22]  Shahar Mendelson,et al.  A Few Notes on Statistical Learning Theory , 2002, Machine Learning Summer School.

[23]  P. MassartLedoux,et al.  Concentration Inequalities Using the Entropy Method , 2002 .

[24]  O. Bousquet A Bennett concentration inequality and its application to suprema of empirical processes , 2002 .

[25]  Peter L. Bartlett,et al.  Localized Rademacher Complexities , 2002, COLT.

[26]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[27]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[28]  OLIVlER 13OUSQUET,et al.  NEW APPROACHES TO STATISTICAL LEARNING THEORY , 2006 .