Penalized estimation of semiparametric transformation models with interval-censored data and application to Alzheimer’s disease

Variable selection or feature extraction is fundamental to identify important risk factors from a large number of covariates and has applications in many fields. In particular, its applications in failure time data analysis have been recognized and many methods have been proposed for right-censored data. However, developing relevant methods for variable selection becomes more challenging when one confronts interval censoring that often occurs in practice. In this article, motivated by an Alzheimer’s disease study, we develop a variable selection method for interval-censored data with a general class of semiparametric transformation models. Specifically, a novel penalized expectation–maximization algorithm is developed to maximize the complex penalized likelihood function, which is shown to perform well in the finite-sample situation through a simulation study. The proposed methodology is then applied to the interval-censored data arising from the Alzheimer’s disease study mentioned above.

[1]  Md Hasinur Rahaman Khan,et al.  Variable selection for accelerated lifetime models with synthesized estimation techniques , 2019, Statistical methods in medical research.

[2]  C. Quesenberry,et al.  The Conditional Probability Integral Transformation and Applications to Obtain Composite Chi-Square Goodness-of-Fit Tests , 1973 .

[3]  Kan Li,et al.  Prediction of Conversion to Alzheimer's Disease with Longitudinal Measures and Time-To-Event Data. , 2017, Journal of Alzheimer's disease : JAD.

[4]  Xinyuan Song,et al.  Variable selection in semiparametric nonmixture cure model with interval‐censored failure time data: An application to the prostate cancer screening study , 2019, Statistics in medicine.

[5]  Xingwei Tong,et al.  REGRESSION ANALYSIS OF CASE II INTERVAL-CENSORED FAILURE TIME DATA WITH THE ADDITIVE HAZARDS MODEL. , 2010, Statistica Sinica.

[6]  Jinchi Lv,et al.  High-Dimensional Sparse Additive Hazards Regression , 2012, 1212.6232.

[7]  Donglin Zeng,et al.  Maximum likelihood estimation for semiparametric transformation models with interval-censored data , 2016, Biometrika.

[8]  Gang Li,et al.  Broken adaptive ridge regression and its asymptotic properties , 2018, J. Multivar. Anal..

[9]  Donglin Zeng,et al.  Checking semiparametric transformation models with censored data. , 2012, Biostatistics.

[10]  A. Oulhaj,et al.  Variable selection in a flexible parametric mixture cure model with interval‐censored data , 2015, Statistics in medicine.

[11]  Jianguo Sun,et al.  A nonparametric test for current status data with unequal censoring , 1999 .

[12]  Xihong Lin,et al.  VARIABLE SELECTION AND ESTIMATION WITH THE SEAMLESS-L0 PENALTY , 2011 .

[13]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[14]  Yongzhao Shao,et al.  Application of concordance probability estimate to predict conversion from mild cognitive impairment to Alzheimer's disease , 2017, Biostatistics & epidemiology.

[15]  Zhaoran Wang,et al.  High Dimensional EM Algorithm: Statistical Optimization and Asymptotic Normality , 2015, NIPS.

[16]  Gang Li,et al.  Efficient Regularized Regression with L 0 Penalty for Variable Selection and Network Construction , 2016, Comput. Math. Methods Medicine.

[17]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[18]  Wanzhu Tu,et al.  Simultaneous variable selection for joint models of longitudinal and survival outcomes , 2015, Biometrics.

[19]  Hui Zhao,et al.  Simple estimation procedures for regression analysis of interval-censored failure time data under the proportional hazards model , 2015, Lifetime data analysis.

[20]  Tao Hu,et al.  Sieve maximum likelihood regression analysis of dependent current status data , 2015 .

[21]  D. Park The Statistical Analysis of Interval-Censored Failure Time Data , 2007 .

[22]  T Cai,et al.  Regularized Estimation for the Accelerated Failure Time Model , 2009, Biometrics.

[23]  Jinchi Lv,et al.  A unified approach to model selection and sparse recovery using regularized least squares , 2009, 0905.3573.

[24]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[25]  Runze Li,et al.  Variable selection for multivariate failure time data. , 2005, Biometrika.

[26]  Michael R. Kosorok,et al.  Robust Inference for Univariate Proportional Hazards Frailty Regression Models , 2004 .

[27]  Richard J Cook,et al.  Penalized regression for interval‐censored times of disease progression: Selection of HLA markers in psoriatic arthritis , 2015, Biometrics.

[28]  Ying Zhang,et al.  Sparse estimation of Cox proportional hazards models via approximated information criteria , 2016, Biometrics.

[29]  D. Zeng,et al.  Variable selection in semiparametric transformation models for right-censored data , 2013 .

[30]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[31]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[32]  Hui Zhao,et al.  Simultaneous Estimation and Variable Selection for Interval-Censored Data With Broken Adaptive Ridge Regression , 2019, Journal of the American Statistical Association.

[33]  Donglin Zeng,et al.  Maximum likelihood estimation in semiparametric regression models with censored data , 2007, Statistica Sinica.

[34]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[35]  Michael G Hudgens,et al.  A flexible, computationally efficient method for fitting the proportional hazards model to interval‐censored data , 2016, Biometrics.

[36]  Jian Huang,et al.  Efficient estimation for the proportional hazards model with interval censoring , 1996 .

[37]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[38]  Jian Huang,et al.  Sieve Estimation for the Proportional-Odds Failure-Time Regression Model with Interval Censoring , 1997 .

[39]  Jian Huang,et al.  Regularized Estimation in the Accelerated Failure Time Model with High‐Dimensional Covariates , 2006, Biometrics.

[40]  Jian Huang,et al.  COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION. , 2011, The annals of applied statistics.

[41]  Xihong Lin,et al.  Variable selection and estimation with the seamless-L0 penalty models , 2012 .

[42]  Tao Hu,et al.  A Sieve Semiparametric Maximum Likelihood Approach for Regression Analysis of Bivariate Interval-Censored Failure Time Data , 2017 .

[43]  Hao Helen Zhang,et al.  Adaptive Lasso for Cox's proportional hazards model , 2007 .

[44]  Jianqing Fan,et al.  Variable Selection for Cox's proportional Hazards Model and Frailty Model , 2002 .