Linear Programming Boosting via Column Generation

We examine linear program (LP) approaches to boosting and demonstrate their efficient solution using LPBoost, a column generation based simplex method. We formulate the problem as if all possible weak hypotheses had already been generated. The labels produced by the weak hypotheses become the new feature space of the problem. The boosting task becomes to construct a learning function in the label space that minimizes misclassification error and maximizes the soft margin. We prove that for classification, minimizing the 1-norm soft margin error function directly optimizes a generalization error bound. The equivalent linear program can be efficiently solved using column generation techniques developed for large-scale optimization problems. The resulting LPBoost algorithm can be used to solve any LP boosting formulation by iteratively optimizing the dual misclassification costs in a restricted LP and dynamically generating weak hypotheses to make new LP columns. We provide algorithms for soft margin classification, confidence-rated, and regression boosting problems. Unlike gradient boosting algorithms, which may converge in the limit only, LPBoost converges in a finite number of iterations to a global solution satisfying mathematically well-defined optimality conditions. The optimal solutions of LPBoost are very sparse in contrast with gradient based methods. Computationally, LPBoost is competitive in quality and computational cost to AdaBoost.

[1]  Kristin P. Bennett,et al.  Combining support vector and mathematical programming methods for classification , 1999 .

[2]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[3]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[4]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  John Shawe-Taylor,et al.  A Column Generation Algorithm For Boosting , 2000, ICML.

[7]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[8]  Gunnar Rätsch,et al.  Robust Ensemble Learning , 2000 .

[9]  Bernhard Schölkopf,et al.  Generalized Support Vector Machines , 2000 .

[10]  Nello Cristianini,et al.  Margin Distribution Bounds on Generalization , 1999, EuroCOLT.

[11]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[12]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[13]  S. Nash,et al.  Linear and Nonlinear Programming , 1987 .

[14]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[15]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[16]  O. Mangasarian Linear and Nonlinear Separation of Patterns by Linear Programming , 1965 .

[17]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[18]  Gunnar Rätsch,et al.  Barrier Boosting , 2000, COLT.

[19]  Tong Zhang Analysis of Regularized Linear Functions for Classification Problems , 1999 .

[20]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[21]  Gunnar Rätsch,et al.  v-Arc: Ensemble Learning in the Presence of Outliers , 1999, NIPS.

[22]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[23]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[24]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[25]  Peter L. Bartlett,et al.  Learning in Neural Networks: Theoretical Foundations , 1999 .

[26]  Olvi L. Mangasarian,et al.  Generalized Support Vector Machines , 1998 .

[27]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[28]  Kristin P. Bennett,et al.  Duality and Geometry in SVM Classifiers , 2000, ICML.