Learning with Square Loss: Localization through Offset Rademacher Complexity

We consider regression with square loss and general classes of functions without the boundedness assumption. We introduce a notion of offset Rademacher complexity that provides a transparent way to study localization both in expectation and in high probability. For any (possibly non-convex) class, the excess loss of a two-step estimator is shown to be upper bounded by this offset complexity through a novel geometric inequality. In the convex case, the estimator reduces to an empirical risk minimizer. The method recovers the results of \citep{RakSriTsy15} for the bounded case while also providing guarantees without the boundedness assumption.

[1]  S. Mendelson,et al.  Aggregation via empirical risk minimization , 2009 .

[2]  V. Koltchinskii,et al.  Oracle inequalities in empirical risk minimization and sparse recovery problems , 2011 .

[3]  V. Koltchinskii,et al.  Rademacher Processes and Bounding the Risk of Function Learning , 2004, math/0405338.

[4]  S. Mendelson,et al.  Learning subgaussian classes : Upper and minimax bounds , 2013, 1305.4825.

[5]  E. Giné,et al.  Some Limit Theorems for Empirical Processes , 1984 .

[6]  Yuhong Yang,et al.  Information-theoretic determination of minimax rates of convergence , 1999 .

[7]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[8]  S. Mendelson On aggregation for heavy-tailed classes , 2015, Probability Theory and Related Fields.

[9]  Jean-Yves Audibert,et al.  Progressive mixture rules are deviation suboptimal , 2007, NIPS.

[10]  Dmitry Panchenko,et al.  Some Local Measures of Complexity of Convex Hulls and Generalization Bounds , 2002, COLT.

[11]  Karthik Sridharan,et al.  Online Non-Parametric Regression , 2014, COLT.

[12]  Shahar Mendelson,et al.  Learning without Concentration , 2014, COLT.

[13]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[14]  Shahar Mendelson,et al.  Improving the sample complexity using global data , 2002, IEEE Trans. Inf. Theory.

[15]  Shahar Mendelson,et al.  A Few Notes on Statistical Learning Theory , 2002, Machine Learning Summer School.

[16]  S. Mendelson Learning without concentration for general loss functions , 2014, 1410.3192.

[17]  O. Bousquet Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms , 2002 .

[18]  Karthik Sridharan,et al.  Empirical Entropy, Minimax Regret and Minimax Risk , 2013, ArXiv.

[19]  Tong Zhang,et al.  Deviation Optimal Learning using Greedy Q-aggregation , 2012, ArXiv.