A Duality View of Boosting Algorithms

We study boosting algorithms from a new perspective. We show that the Lagrange dual problems of AdaBoost, LogitBoost and soft-margin LPBoost with generalized hinge loss are all entropy maximization problems. By looking at the dual problems of these boosting algorithms, we show that the success of boosting algorithms can be understood in terms of maintaining a better margin distribution by maximizing margins and at the same time controlling the margin variance. We also theoretically prove that, approximately, AdaBoost maximizes the average margin, instead of the minimum margin. The duality formulation also enables us to develop column generation based optimization algorithms, which are totally corrective. We show that they exhibit almost identical classification results to that of standard stage-wise additive boosting algorithms but with much faster convergence rates. Therefore fewer weak classifiers are needed to build the ensemble using our proposed optimization technique.

[1]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[2]  Gunnar Rätsch,et al.  Totally corrective boosting algorithms that maximize the margin , 2006, ICML.

[3]  John D. Lafferty,et al.  Boosting and Maximum Likelihood for Exponential Models , 2001, NIPS.

[4]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[5]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[7]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[8]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[9]  Jiri Matas,et al.  AdaBoost with totally corrective updates for fast face detection , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[10]  R. Schapire,et al.  Analysis of boosting algorithms using the smooth margin function , 2007, 0803.4092.

[11]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[12]  Manfred K. Warmuth,et al.  Boosting as entropy projection , 1999, COLT '99.

[13]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.

[14]  Gunnar Rätsch,et al.  Advanced Lectures on Machine Learning , 2004, Lecture Notes in Computer Science.

[15]  Yoram Singer,et al.  On the equivalence of weak learnability and linear separability: new relaxations and efficient boosting algorithms , 2010, Machine Learning.

[16]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[17]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[18]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[19]  O. Kallenberg Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.

[20]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[21]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[22]  Gunnar Rätsch,et al.  Efficient Margin Maximizing with Boosting , 2005, J. Mach. Learn. Res..

[23]  Yoav Freund,et al.  An Adaptive Version of the Boost by Majority Algorithm , 1999, COLT '99.

[24]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[25]  Jure Leskovec,et al.  Linear Programming Boosting for Uneven Datasets , 2003, ICML.

[26]  Robert E. Schapire,et al.  How boosting the margin can also boost classifier complexity , 2006, ICML.

[27]  David Mease,et al.  Evidence Contrary to the Statistical View of Boosting , 2008, J. Mach. Learn. Res..

[28]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[29]  Ryan M. Rifkin,et al.  Value Regularization and Fenchel Duality , 2007, J. Mach. Learn. Res..

[30]  Osamu Watanabe,et al.  MadaBoost: A Modification of AdaBoost , 2000, COLT.

[31]  Jacques Desrosiers,et al.  Selected Topics in Column Generation , 2002, Oper. Res..

[32]  James M. Rehg,et al.  Linear Asymmetric Classifier for cascade detectors , 2005, ICML.

[33]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[34]  Gunnar Rätsch,et al.  Constructing Boosting Algorithms from SVMs: An Application to One-Class Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[36]  Cynthia Rudin,et al.  The Dynamics of AdaBoost: Cyclic Behavior and Convergence of Margins , 2004, J. Mach. Learn. Res..

[37]  Gunnar Rätsch,et al.  An Introduction to Boosting and Leveraging , 2002, Machine Learning Summer School.

[38]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[39]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.