Independent screening for single‐index hazard rate models with ultrahigh dimensional features

Summary.  In data sets with many more features than observations, independent screening based on all univariate regression models leads to a computationally convenient variable selection method. Recent efforts have shown that, in the case of generalized linear models, independent screening may suffice to capture all relevant features with high probability, even in ultrahigh dimension. It is unclear whether this formal sure screening property is attainable when the response is a right-censored survival time. We propose a computationally very efficient independent screening method for survival data which can be viewed as the natural survival equivalent of correlation screening. We state conditions under which the method admits the sure screening property within a class of single-index hazard rate models with ultrahigh dimensional features and describe the generally detrimental effect of censoring on performance. An iterative variant of the method is also described which combines screening with penalized regression to handle more complex feature covariance structures. The methodology is evaluated through simulation studies and through application to a real gene expression data set.

[1]  Qianchuan He,et al.  BIOINFORMATICS ORIGINAL PAPER , 2022 .

[2]  Thomas H. Scheike,et al.  Coordinate Descent Methods for the Penalized Semiparametric Additive Hazards Model , 2012 .

[3]  Yang Feng,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models , 2009, Journal of the American Statistical Association.

[4]  C. D. Hardin,et al.  On the linearity of regression , 1982 .

[5]  G. Wahba,et al.  A NOTE ON THE LASSO AND RELATED PROCEDURES IN MODEL SELECTION , 2006 .

[6]  John D. Kalbfleisch,et al.  Misspecified proportional hazard models , 1986 .

[7]  Odd Aalen,et al.  A Model for Nonparametric Regression Analysis of Counting Processes , 1980 .

[8]  D. Brillinger A Generalized Linear Model With “Gaussian” Regressor Variables , 2012 .

[9]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[10]  Yi Li,et al.  Sure screening for estimating equations in ultra-high dimensions , 2011, 1110.6817.

[11]  Ker-Chau Li,et al.  On almost Linearity of Low Dimensional Projections from High Dimensional Data , 1993 .

[12]  Jianqing Fan,et al.  REGULARIZATION FOR COX'S PROPORTIONAL HAZARDS MODEL WITH NP-DIMENSIONALITY. , 2010, Annals of statistics.

[13]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[14]  Jin-Guan Lin,et al.  Variable selection in a class of single-index models , 2011 .

[15]  O. Aalen A linear regression model for the analysis of life times. , 1989, Statistics in medicine.

[16]  Ker-Chau Li,et al.  Regression Analysis Under Link Violation , 1989 .

[17]  Li-Xing Zhu,et al.  Nonconcave penalized inverse regression in single-index models with high dimensional predictors , 2009, J. Multivar. Anal..

[18]  Hao Helen Zhang Discussion of "Sure Independence Screening for Ultra-High Dimensional Feature Space. , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[19]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[20]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[21]  Chenlei Leng,et al.  Unified LASSO Estimation by Least Squares Approximation , 2007 .

[22]  Torben Martinussen,et al.  Covariate Selection for the Semiparametric Additive Risk Model , 2009 .

[23]  James O. Berger,et al.  Borrowing Strength: Theory Powering Applications – A Festschrift for Lawrence D. Brown , 2010 .

[24]  R. Tibshirani,et al.  Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data , 2004, PLoS biology.

[25]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[26]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[27]  Kuang Fu Cheng,et al.  Adjusted Least Squares Estimates for the Scaled Regression Coefficients with Censored Data , 1994 .

[28]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[29]  Zhiliang Ying,et al.  Semiparametric analysis of the additive risk model , 1994 .

[30]  Yi Li,et al.  Principled sure independence screening for Cox models with ultra-high-dimensional covariates , 2012, J. Multivar. Anal..

[31]  Jianqing Fan,et al.  Sure independence screening in generalized linear models with NP-dimensionality , 2009, The Annals of Statistics.

[32]  Chenlei Leng,et al.  Path consistent model selection in additive risk model via Lasso , 2007, Statistics in medicine.

[33]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[34]  D. Pollard Asymptotics via Empirical Processes , 1989 .

[35]  Xiao Song,et al.  Ranking prognosis markers in cancer genomic studies , 2011, Briefings Bioinform..

[36]  Jianqing Fan,et al.  Penalized composite quasi‐likelihood for ultrahigh dimensional variable selection , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[37]  Some properties of misspecified additive hazards models , 2006 .

[38]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[39]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[40]  P. Massart,et al.  About the constants in Talagrand's concentration inequalities for empirical processes , 2000 .

[41]  Axel Benner,et al.  High‐Dimensional Cox Models: The Choice of Penalty as Part of the Model Building Process , 2010, Biometrical journal. Biometrische Zeitschrift.

[42]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[43]  Jianqing Fan,et al.  Nonconcave Penalized Likelihood With NP-Dimensionality , 2009, IEEE Transactions on Information Theory.

[44]  Ian W. McKeague,et al.  A partly parametric additive risk model , 1994 .

[45]  Ulrich Mansmann,et al.  An 86-probe-set gene-expression signature predicts survival in cytogenetically normal acute myeloid leukemia. , 2008, Blood.

[46]  Yichao Wu,et al.  Ultrahigh Dimensional Feature Selection: Beyond The Linear Model , 2009, J. Mach. Learn. Res..

[47]  Jun Zhang,et al.  Robust rank correlation based screening , 2010, 1012.4255.

[48]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[49]  Robert J Tibshirani,et al.  Statistical Applications in Genetics and Molecular Biology , 2011 .