Time-varying Hazards Model for Incorporating Irregularly Measured, High-Dimensional Biomarkers.

Clinical studies with time-to-event outcomes often collect measurements of a large number of time-varying covariates over time (e.g., clinical assessments or neuroimaging biomarkers) to build time-sensitive prognostic model. An emerging challenge is that due to resource-intensive or invasive (e.g., lumbar puncture) data collection process, biomarkers may be measured infrequently and thus not available at every observed event time point. Lever-aging all available, infrequently measured time-varying biomarkers to improve prognostic model of event occurrence is an important and challenging problem. In this paper, we propose a kernel-smoothing based approach to borrow information across subjects to remedy infrequent and unbalanced biomarker measurements under a time-varying hazards model. A penalized pseudo-likelihood function is proposed for estimation, and an efficient augmented penalization minimization algorithm related to the alternating direction method of multipliers (ADMM) is adopted for computation. Under some regularity conditions to carefully control approximation bias and stochastic variability, we show that even in the presence of ultra-high dimensionality, the proposed method selects important biomarkers with high probability. Through extensive simulation studies, we demonstrate superior performance in terms of estimation and selection performance compared to alternative methods. Finally, we apply the proposed method to analyze a recently completed real world study to model time to disease conversion using longitudinal, whole brain structural magnetic resonance imaging (MRI) biomarkers, and show a substantial improvement in performance over current standards including using baseline measures only.

[1]  D. Zeng,et al.  Efficient ℓ0‐norm feature selection based on augmented and penalized minimization , 2018, Statistics in medicine.

[2]  Jane S. Paulsen,et al.  Validation of a prognostic index for Huntington's disease , 2017, Movement disorders : official journal of the Movement Disorder Society.

[3]  Toshio Honda,et al.  Variable selection and structure identification for varying coefficient Cox models , 2016, J. Multivar. Anal..

[4]  S. Horvath,et al.  Integrated genomics and proteomics to define huntingtin CAG length-dependent networks in HD Mice , 2016, Nature Neuroscience.

[5]  D. Zeng,et al.  Regression analysis of sparse asynchronous longitudinal data , 2015, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[6]  Daniel H. Geschwind,et al.  Systems biology and gene networks in neurodevelopmental and neurodegenerative disorders , 2015, Nature Reviews Genetics.

[7]  D. Zeng,et al.  Analysis of the Proportional Hazards Model With Sparse Longitudinal Covariates , 2015, Journal of the American Statistical Association.

[8]  Joseph G Ibrahim,et al.  Joint modeling of survival and longitudinal non‐survival data: current methods and issues. Report of the DIA Bayesian joint modeling working group , 2015, Statistics in medicine.

[9]  T. Silk,et al.  The emergence of age‐dependent social cognitive deficits after generalized insult to the developing brain: A longitudinal prospective analysis using susceptibility‐weighted imaging , 2015, Human brain mapping.

[10]  Jane S. Paulsen,et al.  Prediction of manifest Huntington's disease with clinical and imaging measures: a prospective observational study , 2014, The Lancet Neurology.

[11]  Jane S. Paulsen,et al.  Huntington disease: natural history, biomarkers and prospects for therapeutics , 2014, Nature Reviews Neurology.

[12]  H. Jeremy Bockholt,et al.  Clinical and Biomarker Changes in Premanifest Huntington Disease Show Trial Feasibility: A Decade of the PREDICT-HD Study , 2014, Front. Aging Neurosci..

[13]  Donglin Zeng,et al.  Targeted Local Support Vector Machine for Age-Dependent Classification , 2014, Journal of the American Statistical Association.

[14]  Daoqiang Zhang,et al.  Identifying Informative Imaging Biomarkers via Tree Structured Sparse Learning for AD Diagnosis , 2013, Neuroinformatics.

[15]  D. Zeng,et al.  Variable selection in semiparametric transformation models for right-censored data , 2013 .

[16]  Chris Frost,et al.  Predictors of phenotypic progression and disease onset in premanifest and early-stage Huntington's disease in the TRACK-HD study: analysis of 36-month observational data , 2013, The Lancet Neurology.

[17]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[18]  Jian Huang,et al.  Incorporating group correlations in genome-wide association studies using smoothed group Lasso. , 2013, Biostatistics.

[19]  E. Bullmore,et al.  Imaging structural co-variance between human brain regions , 2013, Nature Reviews Neuroscience.

[20]  Yongseok Park,et al.  Real‐Time Individual Predictions of Prostate Cancer Recurrence Using Joint Models , 2013, Biometrics.

[21]  Wolfgang Karl Härdle,et al.  Variable Selection in Cox Regression Models with Varying Coefficients , 2012 .

[22]  B. Landwehrmeyer,et al.  M04 European huntington's disease network registry: current status , 2012, Journal of Neurology, Neurosurgery & Psychiatry.

[23]  Jun Yan,et al.  Model Selection for Cox Models with Time‐Varying Coefficients , 2012, Biometrics.

[24]  E. Ray Dorsey,et al.  Characterization of a Large Group of Individuals with Huntington Disease and Their Relatives Enrolled in the COHORT Study , 2012, PloS one.

[25]  Dimitris Rizopoulos,et al.  Dynamic Predictions and Prospective Accuracy in Joint Models for Longitudinal and Time‐to‐Event Data , 2011, Biometrics.

[26]  Tong Zhang,et al.  A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems , 2011, 1108.4988.

[27]  Jane S. Paulsen,et al.  Indexing disease progression at study entry with individuals at‐risk for Huntington disease , 2011, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[28]  Neal Parikh,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[29]  Danielle S Bassett,et al.  Brain graphs: graphical models of the human brain connectome. , 2011, Annual review of clinical psychology.

[30]  D. Surmeier,et al.  Brain networks in Huntington disease. , 2011, Journal of Clinical Investigation.

[31]  Jianqing Fan,et al.  REGULARIZATION FOR COX'S PROPORTIONAL HAZARDS MODEL WITH NP-DIMENSIONALITY. , 2010, Annals of statistics.

[32]  J. Horowitz,et al.  VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS. , 2010, Annals of statistics.

[33]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[34]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[35]  Jianqing Fan,et al.  Nonconcave Penalized Likelihood With NP-Dimensionality , 2009, IEEE Transactions on Information Theory.

[36]  Alan C. Evans,et al.  Structural Insights into Aberrant Topological Patterns of Large-Scale Cortical Networks in Alzheimer's Disease , 2008, The Journal of Neuroscience.

[37]  Larry A. Wasserman,et al.  Time varying undirected graphs , 2008, Machine Learning.

[38]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[39]  Jane S. Paulsen,et al.  Thalamic metabolism and symptom onset in preclinical Huntington's disease. , 2007, Brain : a journal of neurology.

[40]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[41]  Anders M. Dale,et al.  An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest , 2006, NeuroImage.

[42]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[43]  K. Liestøl,et al.  Attenuation caused by infrequently updated covariates in survival analysis. , 2003, Biostatistics.

[44]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[45]  Anastasios A. Tsiatis,et al.  A semiparametric estimator for the proportional hazards model with longitudinal covariates measured with error , 2001 .

[46]  S. Geer Exponential Inequalities for Martingales, with Application to Maximum Likelihood Estimation for Counting Processes , 1995 .

[47]  Manish S. Shah,et al.  A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes , 1993, Cell.

[48]  D. Harrington,et al.  Counting Processes and Survival Analysis , 1991 .

[49]  R. Prentice Covariate measurement errors and parameter estimation in a failure time regression model , 1982 .

[50]  K. Lange The MM Algorithm , 2013 .

[51]  Arvind Ramanathan,et al.  Time-Varying Gaussian Graphical Models of Molecular Dynamics Data , 2010 .

[52]  Jane S. Paulsen Early Detection of Huntington Disease. , 2010, Future neurology.

[53]  L. Schumaker Spline Functions: Basic Theory , 1981 .