Simplified PAC-Bayesian Margin Bounds

The theoretical understanding of support vector machines is largely based on margin bounds for linear classifiers with unit-norm weight vectors and unit-norm feature vectors. Unit-norm margin bounds have been proved previously using fat-shattering arguments and Rademacher complexity. Recently Langford and Shawe-Taylor proved a dimension-independent unit-norm margin bound using a relatively simple PAC-Bayesian argument. Unfortunately, the Langford-Shawe-Taylor bound is stated in a variational form making direct comparison to fat-shattering bounds difficult. This paper provides an explicit solution to the variational problem implicit in the Langford-Shawe-Taylor bound and shows that the PAC-Bayesian margin bounds are significantly tighter. Because a PAC-Bayesian bound is derived from a particular prior distribution over hypotheses, a PAC-Bayesian margin bound also seems to provide insight into the nature of the learning bias underlying the bound.

[1]  H. Ruben A New Asymptotic Expansion for the Normal Probability Integral and Mill's Ratio , 1962 .

[2]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[3]  V. Rich Personal communication , 1989, Nature.

[4]  John Shawe-Taylor,et al.  A framework for structural risk minimisation , 1996, COLT '96.

[5]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[6]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[7]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[8]  John Shawe-Taylor,et al.  Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[9]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[10]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[11]  John Langford,et al.  An Improved Predictive Accuracy Bound for Averaging Classifiers , 2001, ICML.

[12]  Matthias W. Seeger,et al.  PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[13]  Christopher K. I. Williams,et al.  Advances in Neural Information Processing Systems 15 (NIPS 2002) , 2002 .

[14]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[15]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[16]  John Shawe-Taylor,et al.  PAC Bayes and Margins , 2003 .

[17]  Luis E. Ortiz,et al.  Concentration Inequalities for the Missing Mass and for Histogram Rule Error , 2003, J. Mach. Learn. Res..

[18]  David A. McAllester PAC-Bayesian Stochastic Model Selection , 2003, Machine Learning.