Deep Boosting

We present a new ensemble learning algorithm, DeepBoost, which can use as base classifiers a hypothesis set containing deep decision trees, or members of other rich or complex families, and succeed in achieving high accuracy without over-fitting the data. The key to the success of the algorithm is a capacity-conscious criterion for the selection of the hypotheses. We give new data-dependent learning bounds for convex ensembles expressed in terms of the Rademacher complexities of the sub-families composing the base classifier set, and the mixture weight assigned to each sub-family. Our algorithm directly benefits from these guarantees since it seeks to minimize the corresponding learning bound. We give a full description of our algorithm, including the details of its derivation, and report the results of several experiments showing that its performance compares favorably to that of AdaBoost and Logistic Regression and their L1-regularized variants.

[1]  Gunnar Rätsch,et al.  On the Convergence of Leveraging , 2001, NIPS.

[2]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[3]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[4]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[5]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[6]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[7]  Gunnar Rätsch,et al.  Efficient Margin Maximizing with Boosting , 2005, J. Mach. Learn. Res..

[8]  Peter Cheeseman,et al.  Bayesian Methods for Adaptive Models , 2011 .

[9]  Robert E. Schapire,et al.  Theoretical Views of Boosting and Applications , 1999, ALT.

[10]  Gunnar Rätsch,et al.  Totally corrective boosting algorithms that maximize the margin , 2006, ICML.

[11]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[12]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[13]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[14]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[15]  Robert E. Schapire,et al.  How boosting the margin can also boost classifier complexity , 2006, ICML.

[16]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[17]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[18]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[19]  Yoram Singer,et al.  Boosting with structural sparsity , 2009, ICML '09.

[20]  Y. Mansour,et al.  Generalization bounds for averaged classifiers , 2004, math/0410092.

[21]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[22]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[23]  Padhraic Smyth,et al.  Linearly Combining Density Estimators via Stacking , 1999, Machine Learning.

[24]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[25]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[26]  Manfred K. Warmuth,et al.  Boosting as entropy projection , 1999, COLT '99.

[27]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[28]  Yishay Mansour Pessimistic Decision Tree Pruning Based on Tree Size , 1997, ICML 1997.

[29]  Gunnar Rätsch,et al.  Maximizing the Margin with Boosting , 2002, COLT.