论文信息 - Simplified PAC-Bayesian Margin Bounds

Simplified PAC-Bayesian Margin Bounds

The theoretical understanding of support vector machines is largely based on margin bounds for linear classifiers with unit-norm weight vectors and unit-norm feature vectors. Unit-norm margin bounds have been proved previously using fat-shattering arguments and Rademacher complexity. Recently Langford and Shawe-Taylor proved a dimension-independent unit-norm margin bound using a relatively simple PAC-Bayesian argument. Unfortunately, the Langford-Shawe-Taylor bound is stated in a variational form making direct comparison to fat-shattering bounds difficult. This paper provides an explicit solution to the variational problem implicit in the Langford-Shawe-Taylor bound and shows that the PAC-Bayesian margin bounds are significantly tighter. Because a PAC-Bayesian bound is derived from a particular prior distribution over hypotheses, a PAC-Bayesian margin bound also seems to provide insight into the nature of the learning bias underlying the bound.

David A. McAllester

[1] H. Ruben. A New Asymptotic Expansion for the Normal Probability Integral and Mill's Ratio , 1962 .

[2] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[3] V. Rich. Personal communication , 1989, Nature.

[4] John Shawe-Taylor,et al. A framework for structural risk minimisation , 1996, COLT '96.

[5] Yoav Freund,et al. Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[6] Peter L. Bartlett,et al. The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[7] John Shawe-Taylor,et al. Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[8] John Shawe-Taylor,et al. Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[9] B. Schölkopf,et al. Advances in kernel methods: support vector learning , 1999 .

[10] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[11] John Langford,et al. An Improved Predictive Accuracy Bound for Averaging Classifiers , 2001, ICML.

[12] Matthias W. Seeger,et al. PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[13] Christopher K. I. Williams,et al. Advances in Neural Information Processing Systems 15 (NIPS 2002) , 2002 .

[14] V. Koltchinskii,et al. Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[15] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[16] John Shawe-Taylor,et al. PAC Bayes and Margins , 2003 .

[17] Luis E. Ortiz,et al. Concentration Inequalities for the Missing Mass and for Histogram Rule Error , 2003, J. Mach. Learn. Res..

[18] David A. McAllester. PAC-Bayesian Stochastic Model Selection , 2003, Machine Learning.