Chaining Bounds for Empirical Risk Minimization

This paper extends the standard chaining technique to prove excess risk upper bounds for empirical risk minimization with random design settings even if the magnitude of the noise and the estimates is unbounded. The bound applies to many loss functions besides the squared loss, and scales only with the sub-Gaussian or subexponential parameters without further statistical assumptions such as the bounded kurtosis condition over the hypothesis class. A detailed analysis is provided for slope constrained and penalized linear least squares regression with a sub-Gaussian setting, which often proves tight sample complexity bounds up to logartihmic factors.

[1]  D. Pollard Empirical Processes: Theory and Applications , 1990 .

[2]  Gábor Lugosi,et al.  Minimax regret under log loss for general classes of experts , 1999, COLT '99.

[3]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[4]  V. Buldygin,et al.  Metric characterization of random variables and random processes , 2000 .

[5]  S. R. Jammalamadaka,et al.  Empirical Processes in M-Estimation , 2001 .

[6]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[7]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[8]  P. Bartlett,et al.  Empirical minimization , 2006 .

[9]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[10]  László Györfi,et al.  Quantization for Nonparametric Regression , 2008, IEEE Transactions on Information Theory.

[11]  V. Koltchinskii,et al.  Oracle inequalities in empirical risk minimization and sparse recovery problems , 2011 .

[12]  A. W. van der Vaart,et al.  A local maximal inequality under uniform entropy. , 2010, Electronic journal of statistics.

[13]  Sham M. Kakade,et al.  Random Design Analysis of Ridge Regression , 2012, COLT.

[14]  S. Mendelson,et al.  Learning subgaussian classes : Upper and minimax bounds , 2013, 1305.4825.

[15]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[16]  Csaba Szepesvári,et al.  A Finite-Sample Generalization Bound for Semiparametric Regression: Partially Linear Models , 2014, AISTATS.

[17]  Robert C. Williamson,et al.  From Stochastic Mixability to Fast Rates , 2014, NIPS.

[18]  Eunji Lim,et al.  On Convergence Rates of Convex Regression in Multiple Dimensions , 2014, INFORMS J. Comput..

[19]  Shahar Mendelson,et al.  Learning without Concentration , 2014, COLT.

[20]  Karthik Sridharan,et al.  Learning with Square Loss: Localization through Offset Rademacher Complexity , 2015, COLT.

[21]  András György,et al.  Near-optimal max-affine estimators for convex regression , 2015, AISTATS.

[22]  Peter Grünwald,et al.  Fast Rates with Unbounded Losses , 2016, ArXiv.