Local Rademacher complexities and oracle inequalities in risk minimization

Let F be a class of measurable functions f : S → [0, 1] defined on a probability space (S, A, P). Given a sample (X 1 ,..., X n ) of i.i.d. random variables taking values in S with common distribution P, let P n denote the empirical measure based on (X 1 ,..., X n ). We study an empirical risk minimization problem P n f → min, f ∈ F. Given a solution fn of this problem, the goal is to obtain very general upper bounds on its excess risk ep( f n ):= P f n - inf Pf, expressed in terms of relevant geometric parameters of the class F. Using concentration inequalities and other empirical processes tools, we obtain both distribution-dependent and data-dependent upper bounds on the excess risk that are of asymptotically correct order in many examples. The bounds involve localized sup-norms of empirical and Rademacher processes indexed by functions from the class. We use these bounds to develop model selection techniques in abstract risk minimization problems that can be applied to more specialized frameworks of regression and classification.

[1]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[2]  J. Kuelbs Probability on Banach spaces , 1978 .

[3]  E. Giné,et al.  Some Limit Theorems for Empirical Processes , 1984 .

[4]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[5]  W. Wong,et al.  Convergence Rate of Sieve Estimates , 1994 .

[6]  Peter L. Bartlett,et al.  Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.

[7]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[8]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[9]  M. Talagrand New concentration inequalities in product spaces , 1996 .

[10]  M. Talagrand A new look at independence , 1996 .

[11]  P. Massart,et al.  From Model Selection to Adaptive Estimation , 1997 .

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[14]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[15]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[16]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[17]  Uwe Einmahl,et al.  An Empirical Process Approach to the Uniform Consistency of Kernel-Type Function Estimators , 2000 .

[18]  M. Kohler Inequalities for uniform deviations of averages from expectations with applications to nonparametric regression , 2000 .

[19]  V. Koltchinskii,et al.  Rademacher Processes and Bounding the Risk of Function Learning , 2004, math/0405338.

[20]  P. Massart Some applications of concentration inequalities to statistics , 2000 .

[21]  E. Giné,et al.  On consistency of kernel density estimators for randomly censored data: rates holding uniformly over adaptive intervals , 2001 .

[22]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[23]  A. W. van der Vaart,et al.  Uniform Central Limit Theorems , 2001 .

[24]  S. R. Jammalamadaka,et al.  Empirical Processes in M-Estimation , 2001 .

[25]  Dmitry Panchenko,et al.  Some Local Measures of Complexity of Convex Hulls and Generalization Bounds , 2002, COLT.

[26]  Shahar Mendelson,et al.  Geometric Parameters of Kernel Machines , 2002, COLT.

[27]  Shahar Mendelson,et al.  Improving the sample complexity using global data , 2002, IEEE Trans. Inf. Theory.

[28]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[29]  Y. Baraud Model selection for regression on a random design , 2002 .

[30]  O. Bousquet A Bennett concentration inequality and its application to suprema of empirical processes , 2002 .

[31]  Thierry Klein Une inégalité de concentration à gauche pour les processus empiriques , 2002 .

[32]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[33]  V. Koltchinskii,et al.  Bounding the generalization error of convex combinations of classifiers: balancing the dimensionality and the margins , 2004, math/0405345.

[34]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[35]  G. Lugosi,et al.  On the Bayes-risk consistency of regularized boosting methods , 2003 .

[36]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[37]  Jon A. Wellner,et al.  Ratio Limit Theorems for Empirical Processes , 2003 .

[38]  Gilles Blanchard,et al.  On the Rate of Convergence of Regularized Boosting Classifiers , 2003, J. Mach. Learn. Res..

[39]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[40]  G. Lugosi,et al.  Complexity regularization via localized random penalties , 2004, math/0410091.

[41]  S. Geer,et al.  Square root penalty: Adaptation to the margin in classification and in edge estimation , 2005, math/0507422.

[42]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[43]  V. Koltchinskii,et al.  Complexities of convex combinations and bounding the generalization error in classification , 2004, math/0405356.

[44]  S. Boucheron,et al.  Moment inequalities for functions of independent random variables , 2005, math/0503651.

[45]  P. Bartlett,et al.  Empirical minimization , 2006 .

[46]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[47]  V. Koltchinskii,et al.  Concentration inequalities and asymptotic results for ratio type empirical processes , 2006, math/0606788.

[48]  P. Massart,et al.  Statistical performance of support vector machines , 2008, 0804.0551.