On the equivalence of weak learnability and linear separability: new relaxations and efficient boosting algorithms

Boosting algorithms build highly accurate prediction mechanisms from a collection of low-accuracy predictors. To do so, they employ the notion of weak-learnability. The starting point of this paper is a proof which shows that weak learnability is equivalent to linear separability with ℓ1 margin. The equivalence is a direct consequence of von Neumann’s minimax theorem. Nonetheless, we derive the equivalence directly using Fenchel duality. We then use our derivation to describe a family of relaxations to the weak-learnability assumption that readily translates to a family of relaxations of linear separability with margin. This alternative perspective sheds new light on known soft-margin boosting algorithms and also enables us to derive several new relaxations of the notion of linear separability. Last, we describe and analyze an efficient boosting framework that can be used for minimizing the loss functions derived from our family of relaxations. In particular, we obtain efficient boosting algorithms for maximizing hard and soft versions of the ℓ1 margin.

[1]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[2]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[3]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[4]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[5]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[6]  Peter L. Bartlett,et al.  Direct Optimization of Margins Improves Generalization in Combined Classifiers , 1998, NIPS.

[7]  Yoav Freund,et al.  An Adaptive Version of the Boost by Majority Algorithm , 1999, COLT '99.

[8]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[9]  Adrian S. Lewis,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[10]  Dmitry Panchenko,et al.  Some New Bounds on the Generalization Error of Combined Classifiers , 2000, NIPS.

[11]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[12]  Osamu Watanabe,et al.  MadaBoost: A Modification of AdaBoost , 2000, COLT.

[13]  Andrzej Stachurski,et al.  Parallel Optimization: Theory, Algorithms and Applications , 2000, Parallel Distributed Comput. Pract..

[14]  Mark Herbster,et al.  Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[15]  Gunnar Rätsch,et al.  An Introduction to Boosting and Leveraging , 2002, Machine Learning Summer School.

[16]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[17]  Tong Zhang,et al.  Sequential greedy approximation for certain convex optimization problems , 2003, IEEE Trans. Inf. Theory.

[18]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[19]  David D. Denison,et al.  Nonlinear estimation and classification , 2003 .

[20]  Rocco A. Servedio,et al.  Smooth Boosting and Learning with Malicious Noise , 2001, J. Mach. Learn. Res..

[21]  Rocco A. Servedio,et al.  Smooth boosting and learning with malicious noise , 2003 .

[22]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[23]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[24]  Gunnar Rätsch,et al.  Efficient Margin Maximizing with Boosting , 2005, J. Mach. Learn. Res..

[25]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[26]  Yoram Singer,et al.  Convex Repeated Games and Fenchel Duality , 2006, NIPS.

[27]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[28]  Yoram Singer,et al.  Efficient Learning of Label Ranking by Soft Projections onto Polyhedra , 2006, J. Mach. Learn. Res..

[29]  Gunnar Rätsch,et al.  Totally corrective boosting algorithms that maximize the margin , 2006, ICML.

[30]  Yoram Singer,et al.  A primal-dual perspective of online learning algorithms , 2007, Machine Learning.

[31]  Gunnar Rätsch,et al.  Boosting Algorithms for Maximizing the Soft Margin , 2007, NIPS.

[32]  Alexander J. Smola,et al.  Bundle Methods for Machine Learning , 2007, NIPS.

[33]  R. Schapire,et al.  Analysis of boosting algorithms using the smooth margin function , 2007, 0803.4092.

[34]  Shai Shalev-Shwartz,et al.  Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .

[35]  S. V. N. Vishwanathan,et al.  Entropy Regularized LPBoost , 2008, ALT.

[36]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.