An Improved Predictive Accuracy Bound for Averaging Classifiers

We present an improved bound on the difference between training and test errors for voting classifiers. This improved averaging bound provides a theoretical justification for popular averaging techniques such as Bayesian classification, Maximum Entropy discrimination, Winnow and Bayes point machines and has implications for learning algorithm design.

[1]  Temple F. Smith Occam's razor , 1980, Nature.

[2]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[3]  David Haussler,et al.  Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension , 1991, COLT '91.

[4]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[5]  Nick Littlestone,et al.  Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow , 1991, COLT '91.

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[7]  John Shawe-Taylor,et al.  A framework for structural risk minimisation , 1996, COLT '96.

[8]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[9]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[10]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[11]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[12]  Ralf Herbrich,et al.  Bayes Point Machines: Estimating the Bayes Point in Kernel Space , 1999 .

[13]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[14]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[15]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.