Boosting with Diverse Base Classifiers

We establish a new bound on the generalization error rate of the Boost-by-Majority algorithm. The bound holds when the algorithm is applied to a collection of base classifiers that contains a “diverse” subset of “good” classifiers, in a precisely defined sense. We describe cross-validation experiments that suggest that Boost-by-Majority can be the basis of a practically useful learning method, often improving on the generalization of AdaBoost on large datasets.

[1]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[2]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[3]  Philip M. Long,et al.  Boosting and Microarray Data , 2003, Machine Learning.

[4]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[5]  Shie Mannor On the Consistency of Boosting Algorithms , 2001 .

[6]  Russell Impagliazzo,et al.  Hard-core distributions for somewhat hard problems , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[7]  Jun S. Liu,et al.  Sequential Monte Carlo methods for dynamic systems , 1997 .

[8]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[9]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[10]  G. Lugosi,et al.  On the Bayes-risk consistency of regularized boosting methods , 2003 .

[11]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[12]  Rocco A. Servedio,et al.  Boosting and hard-core sets , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[13]  Rajeev Motwani,et al.  Randomized algorithms , 1996, CSUR.

[14]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[15]  John Shawe-Taylor,et al.  PAC Bayes and Margins , 2003 .

[16]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[17]  Gilles Blanchard,et al.  On the Rate of Convergence of Regularized Boosting Classifiers , 2003, J. Mach. Learn. Res..

[18]  Philip M. Long Minimum majority classification and boosting , 2002, AAAI/IAAI.

[19]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[20]  David A. McAllester Simplified PAC-Bayesian Margin Bounds , 2003, COLT.

[21]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[22]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[23]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[24]  Devdatt P. Dubhashi,et al.  Balls and bins: A study in negative dependence , 1996, Random Struct. Algorithms.

[25]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[26]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[27]  Shie Mannor,et al.  The Consistency of Greedy Algorithms for Classification , 2002, COLT.

[28]  Dmitry Gavinsky Optimally-Smooth Adaptive Boosting and Application to Agnostic Learning , 2003, J. Mach. Learn. Res..

[29]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[30]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[31]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[32]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[33]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[34]  Peter L. Bartlett,et al.  Improved Generalization Through Explicit Optimization of Margins , 2000, Machine Learning.

[35]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[36]  Pavel Pudlák,et al.  Threshold circuits of bounded depth , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[37]  Dale Schuurmans,et al.  Regularized Greedy Importance Sampling , 2002, NIPS.

[38]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[39]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[40]  Wenxin Jiang Process consistency for AdaBoost , 2003 .

[41]  Gunnar Rätsch,et al.  Maximizing the Margin with Boosting , 2002, COLT.

[42]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[43]  V. Koltchinskii,et al.  Complexities of convex combinations and bounding the generalization error in classification , 2004, math/0405356.