Statistical inference for Cox proportional hazards models with a diverging number of covariates

For statistical inference on regression models with a diverging number of covariates, the existing literature typically makes sparsity assumptions on the inverse of the Fisher information matrix. Such assumptions, however, are often violated under Cox proportion hazards models, leading to biased estimates with under-coverage confidence intervals. We propose a modified debiased lasso approach, which solves a series of quadratic programming problems to approximate the inverse information matrix without posing sparse matrix assumptions. We establish asymptotic results for the estimated regression coefficients when the dimension of covariates diverges with the sample size. As demonstrated by extensive simulations, our proposed method provides consistent estimates and confidence intervals with nominal coverage probabilities. The utility of the method is further demonstrated by assessing the effects of genetic markers on patients’ overall survival with the Boston Lung Cancer Survival Cohort, a large-scale epidemiology study investigating mechanisms underlying the lung cancer.

[1]  Y. Bossé,et al.  A Decade of GWAS Results in Lung Cancer , 2017, Cancer Epidemiology, Biomarkers & Prevention.

[2]  Lu Xia,et al.  A Revisit to De-biased Lasso for Generalized Linear Models. , 2020, 2006.12778.

[3]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[4]  Anestis Antoniadis,et al.  The Dantzig Selector in Cox's Proportional Hazards Model , 2009 .

[5]  Jiang Gui,et al.  Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data , 2005, Bioinform..

[6]  P. J. Verweij,et al.  Cross-validation in survival analysis. , 1993, Statistics in medicine.

[7]  J. Holland,et al.  Patient education level as a predictor of survival in lung cancer clinical trials. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[8]  D.,et al.  Regression Models and Life-Tables , 2022 .

[9]  Ethan X. Fang,et al.  Testing and confidence intervals for high dimensional proportional hazards models , 2014, 1412.5158.

[10]  K. Stefánsson,et al.  analysis of single nucleotide polymorphisms of 125 DNA repair genes in the exas genome-wide association study of lung cancer with a replication for the RCC 4 SNPs , 2011 .

[11]  Guang Cheng,et al.  High‐dimensional robust inference for Cox regression models using desparsified Lasso , 2018, Scandinavian Journal of Statistics.

[12]  J. Coebergh,et al.  Trends in incidence and prognosis of the histological subtypes of lung cancer in North America, Australia, New Zealand and Europe. , 2001, Lung cancer.

[13]  Cun-Hui Zhang,et al.  ORACLE INEQUALITIES FOR THE LASSO IN THE COX MODEL. , 2013, Annals of statistics.

[14]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[15]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[16]  Zhe Fei,et al.  Estimation and Inference for High Dimensional Generalized Linear Models: A Splitting and Smoothing Approach , 2019, J. Mach. Learn. Res..

[17]  Hongliang Liu,et al.  Molecular Carcinogenesis , 2019 .

[18]  Lei Yao,et al.  BRCA2 N372H polymorphism and breast cancer susceptibility: a meta-analysis involving 44,903 subjects , 2010, Breast Cancer Research and Treatment.

[19]  Ming D. Li,et al.  Significant associations of CHRNA2 and CHRNA6 with nicotine dependence in European American and African American populations , 2013, Human Genetics.

[20]  Yi Yu,et al.  Confidence intervals for high-dimensional Cox models , 2018, Statistica Sinica.

[21]  Han Liu,et al.  A General Theory of Hypothesis Tests and Confidence Regions for Sparse High Dimensional Models , 2014, 1412.8765.

[22]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[23]  R. Prentice,et al.  Commentary on Andersen and Gill's "Cox's Regression Model for Counting Processes: A Large Sample Study" , 1982 .

[24]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[25]  B. Ryan,et al.  Histologic Lung Cancer Incidence Rates and Trends Vary by Race/Ethnicity and Residential County , 2018, Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer.

[26]  Verzekeren Naar Sparen,et al.  Cambridge , 1969, Humphrey Burton: In My Own Time.

[27]  William S. Bush,et al.  Large-scale association analysis identifies new lung cancer susceptibility loci and heterogeneity in genetic susceptibility across histological subtypes , 2017, Nature Genetics.

[28]  S. Kong,et al.  Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso. , 2012, Statistica Sinica.

[29]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[30]  Jianqing Fan,et al.  Variable Selection for Cox's proportional Hazards Model and Frailty Model , 2002 .