VIF Regression: A Fast Regression Algorithm for Large Data

We propose a fast and accurate algorithm, VIF regression, for doing feature selection in large regression problems. VIF regression is extremely fast; it uses a one-pass search over the predictors and a computationally efficient method of testing each potential predictor for addition to the model. VIF regression provably avoids model overfitting, controlling the marginal false discovery rate. Numerical results show that it is much faster than any other published algorithm for regression with feature selection and is as accurate as the best of the slower algorithms.

[1]  S. Konishi An approximation to the distribution of the sample correlation coefficient , 1978 .

[2]  D. Freedman,et al.  How Many Variables Should Be Entered in a Regression Equation , 1983 .

[3]  A. Atkinson Subset Selection in Regression , 1992 .

[4]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[5]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[6]  Michael P. Jones Indicator and stratification methods for missing explanatory variables in multiple linear regression , 1996 .

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  Dean P. Foster,et al.  Variable Selection in Data Mining , 2004 .

[9]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[10]  John D. Storey A direct approach to false discovery rates , 2002 .

[11]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[12]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[13]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[14]  Y. Ritov,et al.  Persistence in high-dimensional linear predictor selection and the virtue of overparametrization , 2004 .

[15]  Avishai Mandelbaum,et al.  Statistical Analysis of a Telephone Call Center , 2005 .

[16]  Jing Zhou,et al.  Streamwise Feature Selection , 2006, J. Mach. Learn. Res..

[17]  Jonathan Weinberg,et al.  Bayesian Forecasting of an Inhomogeneous Poisson Process With Applications to Call Center Data , 2007 .

[18]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[19]  Tong Zhang,et al.  Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models , 2008, NIPS.

[20]  Dean P. Foster,et al.  α‐investing: a procedure for sequential control of expected false discoveries , 2008 .

[21]  Hansheng Wang Forward Regression for Ultra-High Dimensional Variable Screening , 2009 .

[22]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[23]  Lie Wang,et al.  Orthogonal Matching Pursuit for Sparse Signal Recovery , 2010 .

[24]  J. Friedman Fast sparse regression and classification , 2012 .