Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces

We examine methods for constructing regression ensembles based on a linear program (LP). The ensemble regression function consists of linear combinations of base hypotheses generated by some boosting-type base learning algorithm. Unlike the classification case, for regression the set of possible hypotheses producible by the base learning algorithm may be infinite. We explicitly tackle the issue of how to define and solve ensemble regression when the hypothesis space is infinite. Our approach is based on a semi-infinite linear program that has an infinite number of constraints and a finite number of variables. We show that the regression problem is well posed for infinite hypothesis spaces in both the primal and dual spaces. Most importantly, we prove there exists an optimal solution to the infinite hypothesis space problem consisting of a finite number of hypothesis. We propose two algorithms for solving the infinite and finite hypothesis problems. One uses a column generation simplex-type algorithm and the other adopts an exponential barrier approach. Furthermore, we give sufficient conditions for the base learning algorithm and the hypothesis set to be used for infinite regression ensembles. Computational results show that these methods are extremely promising.

[1]  M. Zibulevsky,et al.  Penalty/Barrier Multiplier Algorithm for Semidefinite Programming: Dual Bounds and Implementation , 1996 .

[2]  Kenneth O. Kortanek,et al.  Semi-Infinite Programming: Theory, Methods, and Applications , 1993, SIAM Rev..

[3]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[4]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[5]  Klaus-Robert Müller,et al.  Annealed Competition of Experts for a Segmentation and Classification of Switching Dynamics , 1996, Neural Computation.

[6]  John Shawe-Taylor,et al.  A Column Generation Algorithm For Boosting , 2000, ICML.

[7]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[8]  Toniann Pitassi,et al.  A Gradient-Based Boosting Algorithm for Regression Problems , 2000, NIPS.

[9]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[10]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[11]  Harris Drucker,et al.  Improving Regressors using Boosting Techniques , 1997, ICML.

[12]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[13]  Gunnar Rätsch,et al.  Barrier Boosting , 2000, COLT.

[14]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[15]  B. Schölkopf,et al.  Linear programs for automatic accuracy control in regression. , 1999 .

[16]  David P. Helmbold,et al.  Leveraging for Regression , 2000, COLT.

[17]  Harris Drucker,et al.  Boosting Performance in Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[18]  G. Rätsch Robust Boosting via Convex Optimization , 2001 .

[19]  L. Glass,et al.  Oscillation and chaos in physiological control systems. , 1977, Science.

[20]  Tom,et al.  A simple cost function for boostingMarcus , .

[21]  Paola Campadelli,et al.  A Boosting Algorithm for Regression , 1997, ICANN.

[23]  Gunnar Rätsch,et al.  Robust Ensemble Learning , 2000 .

[24]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[25]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[26]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[27]  Thomas Richardson,et al.  Boosting methodology for regression problems , 1999, AISTATS.

[28]  Klaus-Robert Müller,et al.  Analysis of switching dynamics with competing neural networks , 1995 .

[29]  J. Dussault,et al.  Stable exponential-penalty algorithm with superlinear convergence , 1994 .

[30]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[32]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[33]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[34]  Paul S. Bradley,et al.  Parsimonious Least Norm Approximation , 1998, Comput. Optim. Appl..

[35]  David W. Opitz,et al.  An Empirical Evaluation of Bagging and Boosting , 1997, AAAI/IAAI.

[36]  Gunnar Rätsch,et al.  A Mathematical Programming Approach to the Kernel Fisher Algorithm , 2000, NIPS.

[37]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[38]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[39]  H. Schwenk,et al.  Adaboosting neural networks , 1997 .

[40]  Y. Censor,et al.  Parallel Optimization: Theory, Algorithms, and Applications , 1997 .

[41]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[42]  Andreas S. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[43]  Manfred K. Warmuth,et al.  Boosting as entropy projection , 1999, COLT '99.

[44]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[45]  Yoshua Bengio,et al.  AdaBoosting Neural Networks: Application to on-line Character Recognition , 1997, ICANN.

[46]  Peter L. Bartlett,et al.  Improved Generalization Through Explicit Optimization of Margins , 2000, Machine Learning.

[47]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[48]  T. Terlaky,et al.  Logarithmic barrier decomposition methods for semi-infinite programming , 1997 .

[49]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[50]  Gunnar Rätsch,et al.  Predicting Time Series with Support Vector Machines , 1997, ICANN.