论文信息 - VIF Regression: A Fast Regression Algorithm for Large Data

VIF Regression: A Fast Regression Algorithm for Large Data

We propose a fast and accurate algorithm, VIF regression, for doing feature selection in large regression problems. VIF regression is extremely fast; it uses a one-pass search over the predictors and a computationally efficient method of testing each potential predictor for addition to the model. VIF regression provably avoids model overfitting, controlling the marginal false discovery rate. Numerical results show that it is much faster than any other published algorithm for regression with feature selection and is as accurate as the best of the slower algorithms.

Dean P. Foster | L. Ungar | Dongyu Lin

[1] S. Konishi. An approximation to the distribution of the sample correlation coefficient , 1978 .

[2] D. Freedman,et al. How Many Variables Should Be Entered in a Regression Equation , 1983 .

[3] A. Atkinson. Subset Selection in Regression , 1992 .

[4] Balas K. Natarajan,et al. Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[5] Y. Benjamini,et al. Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[6] Michael P. Jones. Indicator and stratification methods for missing explanatory variables in multiple linear regression , 1996 .

[7] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[8] Dean P. Foster,et al. Variable Selection in Data Mining , 2004 .

[9] John D. Storey,et al. Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[10] John D. Storey. A direct approach to false discovery rates , 2002 .

[11] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..