Sure independence screening in generalized linear models with NP-dimensionality

Ultrahigh dimensional variable selection plays an increasingly important role in contemporary scientific discoveries and statistical research. Among others, Fan and Lv (2008) propose an independent screening framework by ranking the marginal correlations. They showed that the correlation ranking procedure possesses a sure independence screening property within the context of the linear model with Gaussian covariates and responses. In this paper, we propose a more general version of the independent learning with ranking the maximum marginal likelihood estimates or the maximum marginal likelihood itself in generalized linear models. We show that the proposed methods, with Fan and Lv (2008) as a very special case, also possess the sure screening property with vanishing false selection rate. The conditions under which that the independence learning possesses a sure screening is surprisingly simple. This justifies the applicability of such a simple method in a wide spectrum. We quantify explicitly the extent to which the dimensionality can be reduced by independence screening, which depends on the interactions of the covariance matrix of covariates and true parameters. Simulation studies are used to illustrate the utility of the proposed approaches. In addition, we � Supported in part by Grant NSF grants DMS-0714554 and DMS-0704337. The bulk of the work was conducted when Rui Song was a postdoctoral research fellow at Princeton University. The authors would like to thank the associate editor and two referees for their constructive comments that improve the presentation and the results of the paper. AMS 2000 subject classifications: Primary 68Q32, 62J12; secondary 62E99, 60F10

[1]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[2]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[3]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[4]  P. Bickel,et al.  Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .

[5]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[6]  P. Bickel,et al.  An Analysis of Transformations Revisited , 1981 .

[7]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[8]  L. Fahrmeir,et al.  Correction: Consistency and Asymptotic Normality of the Maximum Likelihood Estimator in Generalized Linear Models , 1985 .

[9]  L. Fahrmeir,et al.  Asymptotic inference in discrete response models , 1986 .

[10]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[11]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[12]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[13]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[14]  P. Massart,et al.  About the constants in Talagrand's concentration inequalities for empirical processes , 2000 .

[15]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[16]  C. PillersDobler,et al.  Mathematical Statistics: Basic Ideas and Selected Topics (vol. 1, 2nd ed.) , 2002 .

[17]  S. Geer M-estimation using penalties or sieves , 2002 .

[18]  Michael R. Kosorok,et al.  Robust Inference for Univariate Proportional Hazards Frailty Regression Models , 2004 .

[19]  Y. Ritov,et al.  Persistence in high-dimensional linear predictor selection and the virtue of overparametrization , 2004 .

[20]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[21]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[22]  E. Candès,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[23]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[24]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[25]  Jianqing Fan,et al.  High Dimensional Classification Using Features Annealed Independence Rules. , 2007, Annals of statistics.

[26]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[27]  J. Horowitz,et al.  Asymptotic properties of bridge estimators in sparse high-dimensional regression models , 2008, 0804.0693.

[28]  Yichao Wu,et al.  Ultrahigh Dimensional Feature Selection: Beyond The Linear Model , 2009, J. Mach. Learn. Res..

[29]  P. Hall,et al.  Tilting methods for assessing the influence of components in a classifier , 2009 .

[30]  Peter Hall,et al.  Using Generalized Correlation to Effect Variable Selection in Very High Dimensional Problems , 2009 .

[31]  Donglin Zeng,et al.  Maximum likelihood estimation in semiparametric regression models with censored data , 2007, Statistica Sinica.