On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization

This work characterizes the generalization ability of algorithms whose predictions are linear in the input vector. To this end, we provide sharp bounds for Rademacher and Gaussian complexities of (constrained) linear classes, which directly lead to a number of generalization bounds. This derivation provides simplified proofs of a number of corollaries including: risk bounds for linear prediction (including settings where the weight vectors are constrained by either L2 or L1 constraints), margin bounds (including both L2 and L1 margins, along with more general notions based on relative entropy), a proof of the PAC-Bayes theorem, and upper bounds on L2 covering numbers (with Lp norm constraints and relative entropy constraints). In addition to providing a unified analysis, the results herein provide some of the sharpest risk and margin bounds. Interestingly, our results show that the uniform convergence rates of empirical risk minimization algorithms tightly match the regret bounds of online learning algorithms for linear prediction, up to a constant factor of 2.

[1]  D. Pollard Convergence of stochastic processes , 1984 .

[2]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[3]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[4]  John Shawe-Taylor,et al.  Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[5]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[6]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[7]  John Langford,et al.  An Improved Predictive Accuracy Bound for Averaging Classifiers , 2001, ICML.

[8]  Tong Zhang,et al.  Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..

[9]  John Shawe-Taylor,et al.  PAC-Bayes & Margins , 2002, NIPS.

[10]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[11]  David A. McAllester Simplified PAC-Bayesian Margin Bounds , 2003, COLT.

[12]  Ron Meir,et al.  Generalization Error Bounds for Bayesian Mixture Algorithms , 2003, J. Mach. Learn. Res..

[13]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[14]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[15]  B. Schölkopf,et al.  Convex Repeated Games and Fenchel Duality , 2007 .

[16]  Shai Shalev-Shwartz,et al.  Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .

[17]  Nathan Srebro,et al.  Fast Rates for Regularized Objectives , 2008, NIPS.