Efficient Margin Maximizing with Boosting

AdaBoost produces a linear combination of base hypotheses and predicts with the sign of this linear combination. The linear combination may be viewed as a hyperplane in feature space where the base hypotheses form the features. It has been observed that the generalization error of the algorithm continues to improve even after all examples are on the correct side of the current hyperplane. The improvement is attributed to the experimental observation that the distances (margins) of the examples to the separating hyperplane are increasing even after all examples are on the correct side.We introduce a new version of AdaBoost, called AdaBoost*ν, that explicitly maximizes the minimum margin of the examples up to a given precision. The algorithm incorporates a current estimate of the achievable margin into its calculation of the linear coefficients of the base hypotheses. The bound on the number of iterations needed by the new algorithms is the same as the number needed by a known version of AdaBoost that must have an explicit estimate of the achievable margin as a parameter. We also illustrate experimentally that our algorithm requires considerably fewer iterations than other algorithms that aim to maximize the margin.

[1]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[2]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[3]  Gunnar Rätsch,et al.  Constructing Boosting Algorithms from SVMs: An Application to One-Class Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Kenneth O. Kortanek,et al.  Semi-Infinite Programming: Theory, Methods, and Applications , 1993, SIAM Rev..

[5]  Olvi L. Mangasarian,et al.  Arbitrary-norm separating plane , 1999, Oper. Res. Lett..

[6]  Dmitry Panchenko,et al.  Some New Bounds on the Generalization Error of Combined Classifiers , 2000, NIPS.

[7]  Manfred K. Warmuth,et al.  Boosting as entropy projection , 1999, COLT '99.

[8]  Gunnar Rätsch,et al.  Learning Interpretable SVMs for Biological Sequence Classification , 2005, BMC Bioinformatics.

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[11]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[12]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[13]  Cynthia Rudin,et al.  On the Dynamics of Boosting , 2003, NIPS.

[14]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[15]  Tong Zhang,et al.  Sequential greedy approximation for certain convex optimization problems , 2003, IEEE Trans. Inf. Theory.

[16]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[17]  John Shawe-Taylor,et al.  A Column Generation Algorithm For Boosting , 2000, ICML.

[18]  Gunnar Rätsch,et al.  Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces , 2002, Machine Learning.

[19]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[20]  G. Rätsch Robust Boosting via Convex Optimization , 2001 .

[21]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[22]  J. Ross Quinlan,et al.  Boosting First-Order Learning , 1996, ALT.

[23]  Cynthia Rudin,et al.  Boosting Based on a Smooth Margin , 2004, COLT.

[24]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[25]  Gunnar Rätsch,et al.  Maximizing the Margin with Boosting , 2002, COLT.

[26]  Cynthia Rudin,et al.  The Dynamics of AdaBoost: Cyclic Behavior and Convergence of Margins , 2004, J. Mach. Learn. Res..

[27]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[28]  J. Lafferty Additive models, boosting, and inference for generalized divergences , 1999, COLT '99.

[29]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[30]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[31]  Robert E. Schapire,et al.  Design and analysis of efficient learning algorithms , 1992, ACM Doctoral dissertation award ; 1991.

[32]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..