A branch and bound algorithm for computing the best subset regression models

An efficient branch-and-bound algorithm for computing the best-subset regression models is proposed. The algorithm avoids the computation of the whole regression tree that generates all possible subset models. It is formally shown that if the branch-and-bound test holds, then the current subtree together with its right-hand side subtrees are cut. This reduces significantly the computational burden of the proposed algorithm when compared to an existing leaps-and-bounds method which generates two trees. Specifically, the proposed algorithm, which is based on orthogonal transformations, outperforms by O(n3) the leaps-and-bounds strategy. The criteria used in identifying the best subsets are based on monotone functions of the residual sum of squares (RSS) such as R2, adjusted R2, mean square error of prediction, and Cp. Strategies and heuristics that improve the computational performance of the proposed algorithm are investigated. A computationally efficient heuristic version of the branch-and-bound strategy which decides to cut subtrees using a tolerance parameter is proposed. The heuristic algorithm derives models close to the best ones. However, it is shown analytically that the relative error of the RSS, and consequently the corresponding statistic, of the computed subsets is smaller than the value of the tolerance parameter which lies between zero and one. Computational results and experiments on random and real data are presented and analyzed.

[1]  Martin S. Ridout An Improved Branch and Bound Algorithm for Feature Subset Selection , 1988 .

[2]  D. M. Allen Mean Square Error of Prediction as a Criterion for Selecting Variables , 1971 .

[3]  C. Mallows More comments on C p , 1995 .

[4]  Alan J. Miller,et al.  Least Squares Routines to Supplement Those of Gentleman , 1992 .

[5]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[6]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[7]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[8]  Alan J. Miller Sélection of subsets of regression variables , 1984 .

[9]  Robert W. Wilson,et al.  Regressions by Leaps and Bounds , 2000, Technometrics.

[10]  D. M. Smith,et al.  All possible subset regressions using the QR decomposition , 1989 .

[11]  Erricos John Kontoghiorghes,et al.  Solving the Updated and Downdated Ordinary Linear Model on Massively Parallel Simd Systems , 1993, Parallel Algorithms Appl..

[12]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[13]  E. L. Lawler,et al.  Branch-and-Bound Methods: A Survey , 1966, Oper. Res..

[14]  Erricos John Kontoghiorghes,et al.  New Parallel Strategies for Block Updating the Qr Decomposition , 1995, Parallel Algorithms Appl..

[15]  R. R. Hocking Developments in linear regression methodology: 1959-1982 , 1983 .

[16]  Sarah J. Roberts,et al.  Algorithm AS 199: A Branch and Bound Algorithm for Determining the Optimal Feature Subset of Given Size , 1984 .

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  Peter Winker Optimization Heuristics in Econometrics : Applications of Threshold Accepting , 2000 .

[19]  R. R. Hocking The analysis and selection of variables in linear regression , 1976 .

[20]  D. Edwards,et al.  A fast model selection procedure for large families of models , 1987 .

[21]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[22]  M. R. B. Clarke,et al.  A Givens Algorithm for Moving from One Linear Model to Another Without Going Back to the Data , 1981 .

[23]  Frieder Keller,et al.  Variable Selection in Logistic Regression Models , 2004 .

[24]  Erricos John Kontoghiorghes,et al.  Parallel Strategies for Rank-k Updating of the QR Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[25]  R. R. Hocking,et al.  Selection of the Best Subset in Regression Analysis , 1967 .

[26]  Erricos John Kontoghiorghes,et al.  Parallel Reorthogonalization of the QR Decomposition After Deleting Columns , 1993, Parallel Comput..

[27]  J. Goodnight A Tutorial on the SWEEP Operator , 1979 .

[28]  R. R. Hocking,et al.  Computational Efficieucy in the Selection of Regression Variables , 1970 .

[29]  Erricos John Kontoghiorghes,et al.  Parallel Strategies for Computing the Orthogonal Factorizations Used in the Estimation of Econometric Models , 1999, Algorithmica.

[30]  R. R. Hocking Criteria for Selection of a Subset Regression: Which One Should Be Used? , 1972 .

[31]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[32]  Stephen P. Boyd,et al.  Branch and Bound Methods , 1987 .

[33]  Erricos John Kontoghiorghes,et al.  Parallel Algorithms for Linear Models: Numerical Methods and Estimation Problems , 2000 .

[34]  Alan J. Miller Correction to Algorithm as 274: Least Squares Routines to Supplement Those of Gentleman , 1994 .

[35]  Erricos John Kontoghiorghes,et al.  Parallel algorithms for computing all possible subset regression models using the QR decomposition , 2003, Parallel Comput..