Building generalized linear models with ultrahigh dimensional features: A sequentially conditional approach

Conditional screening approaches have emerged as a powerful alternative to the commonly used marginal screening as they can identify marginally weak but conditionally important variables. However, most existing conditional screening methods need to fix the initial conditioning set, which may determine the ultimately selected variables. If the conditioning set is not properly chosen, the methods may produce false negatives and positives. Moreover, screening approaches typically need to involve tuning parameters and extra modeling steps in order to reach a final model. We propose a sequential conditioning approach by dynamically updating the conditioning set with an iterative selection process. We provide its theoretical properties under the framework of generalized linear models. Powered by an extended Bayesian information criterion as the stopping rule, the method will lead to a final model without the need to choose tuning parameters or threshold parameters. The practical utility of the proposed method is examined via extensive simulations and analysis of a real clinical study on predicting multiple myeloma patients' response to treatment based on their genomic profiles. This article is protected by copyright. All rights reserved.

[1]  Jinfeng Xu,et al.  Extended Bayesian information criterion in the Cox model with a high-dimensional feature space , 2015 .

[2]  Youping Deng,et al.  Feature Selection and Classification of MAQC-II Breast Cancer and Multiple Myeloma Microarray Gene Expression Data , 2009, PloS one.

[3]  Anthony Boral,et al.  Gene expression profiling and correlation with outcome in clinical trials of the proteasome inhibitor bortezomib. , 2006, Blood.

[4]  Emery N. Brown,et al.  A Signal-to-Noise Ratio Estimator for Generalized Linear Model Systems , 2008 .

[5]  Zehua Chen,et al.  Sequential Lasso Cum EBIC for Feature Selection With Ultra-High Dimensional Feature Space , 2014 .

[6]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[7]  Yi Li,et al.  Gene Expression Profile Alone Is Inadequate In Predicting Complete Response In Multiple Myeloma , 2014, Leukemia.

[8]  C. Borror Generalized Linear Models and Extensions, Second Edition , 2008 .

[9]  Qi Zheng,et al.  Survival impact index and ultrahigh‐dimensional model‐free screening with survival outcomes , 2016, Biometrics.

[10]  Qi Zheng,et al.  GLOBALLY ADAPTIVE QUANTILE REGRESSION WITH ULTRA-HIGH DIMENSIONAL DATA. , 2015, Annals of statistics.

[11]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[12]  R. Tibshirani,et al.  "Preconditioning" for feature selection and regression in high-dimensional problems , 2007, math/0703858.

[13]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[14]  Jianqing Fan,et al.  Conditional Sure Independence Screening , 2012, Journal of the American Statistical Association.

[15]  Yingying Fan,et al.  Tuning parameter selection in high dimensional penalized likelihood , 2013, 1605.03321.

[16]  Toshio Honda,et al.  Forward Variable Selection for Sparse Ultra-High Dimensional Varying Coefficient Models , 2014, 1410.6556.

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  A. Belloni,et al.  L1-Penalised quantile regression in high-dimensional sparse models , 2009 .

[19]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[20]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[21]  K. Archer,et al.  Molecular Mechanisms Involved in the Interaction Effects of Alcohol and Hepatitis C Virus in Liver Cirrhosis , 2010, Molecular medicine.

[22]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[23]  Heping Zhang,et al.  Variable Selection With Prior Information for Generalized Linear Models via the Prior LASSO Method , 2016, Journal of the American Statistical Association.

[24]  Chen Xu,et al.  The Sparse MLE for Ultrahigh-Dimensional Feature Screening , 2014, Journal of the American Statistical Association.

[25]  Hansheng Wang Forward Regression for Ultra-High Dimensional Variable Screening , 2009 .

[26]  B. Leroux Consistent estimation of a mixing distribution , 1992 .

[27]  Yi Li,et al.  Principled sure independence screening for Cox models with ultra-high-dimensional covariates , 2012, J. Multivar. Anal..

[28]  Marius Kwemou,et al.  Non-asymptotic oracle inequalities for the Lasso and Group Lasso in high dimensional logistic model , 2012, 1206.0710.

[29]  Yuan Jiang,et al.  High-dimensional regression and classification under a class of convex loss functions , 2013 .

[30]  Hyokyoung G Hong,et al.  Weak signals in high-dimension regression: detection, estimation and prediction. , 2019, Applied stochastic models in business and industry.

[31]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[32]  Jianqing Fan,et al.  Sure independence screening in generalized linear models with NP-dimensionality , 2009, The Annals of Statistics.

[33]  R. Fassnacht,et al.  Molecular Mechanisms Involved Involved in the Interaction Effects of HCV and Ethanol on Liver Cirrhosis , 2010 .

[34]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[35]  Jianqing Fan,et al.  REGULARIZATION FOR COX'S PROPORTIONAL HAZARDS MODEL WITH NP-DIMENSIONALITY. , 2010, Annals of statistics.

[36]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[37]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[38]  B. Peter BOOSTING FOR HIGH-DIMENSIONAL LINEAR MODELS , 2006 .

[39]  Ash A. Alizadeh,et al.  Mutations in early follicular lymphoma progenitors are associated with suppressed antigen presentation , 2015, Proceedings of the National Academy of Sciences.

[40]  Zehua Chen,et al.  EXTENDED BIC FOR SMALL-n-LARGE-P SPARSE GLM , 2012 .

[41]  Lan Wang,et al.  A data‐driven approach to conditional screening of high‐dimensional variables , 2016 .