Relative Deviation Margin Bounds

We present a series of new and more favorable margin-based learning guarantees that depend on the empirical margin loss of a predictor. We give two types of learning bounds, both data-dependent ones and bounds valid for general families, in terms of the Rademacher complexity or the empirical $\ell_\infty$ covering number of the hypothesis set used. We also briefly highlight several applications of these bounds and discuss their connection with existing results.

[1]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[2]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[3]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[4]  D. Pollard Asymptotics via Empirical Processes , 1989 .

[5]  Mehryar Mohri,et al.  Relative deviation learning bounds and generalization with unbounded loss functions , 2013, Annals of Mathematics and Artificial Intelligence.

[6]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[7]  John Shawe-Taylor,et al.  A Result of Vapnik with Applications , 1993, Discret. Appl. Math..

[8]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[9]  R. Dudley A course on empirical processes , 1984 .

[10]  Ron Meir,et al.  Generalization Error Bounds for Bayesian Mixture Algorithms , 2003, J. Mach. Learn. Res..

[11]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[12]  John Shawe-Taylor,et al.  Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[15]  Allan Grønlund Jørgensen,et al.  Near-Tight Margin-Based Generalization Bounds for Support Vector Machines , 2020, ICML.

[16]  Ryota Tomioka,et al.  Norm-Based Capacity Control in Neural Networks , 2015, COLT.

[17]  Mehryar Mohri,et al.  Tight Lower Bound on the Probability of a Binomial Exceeding its Expectation , 2013, ArXiv.

[18]  Philip M. Long,et al.  Generalization bounds for deep convolutional neural networks , 2019, ICLR.

[19]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[20]  Haipeng Luo,et al.  Hypothesis Set Stability and Generalization , 2019, NeurIPS.

[21]  Tong Zhang,et al.  Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..

[22]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[23]  Ambuj Tewari,et al.  On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.

[24]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[25]  Ambuj Tewari,et al.  Smoothness, Low Noise and Fast Rates , 2010, NIPS.

[26]  Sanjoy Dasgupta,et al.  A General Agnostic Active Learning Algorithm , 2007, ISAIM.

[27]  R. Dudley Universal Donsker Classes and Metric Entropy , 1987 .

[28]  D. Pollard Convergence of stochastic processes , 1984 .