Joint modeling of longitudinal data and discrete-time survival outcome

A predictive joint shared parameter model is proposed for discrete time-to-event and longitudinal data. A discrete survival model with frailty and a generalized linear mixed model for the longitudinal data are joined to predict the probability of events. This joint model focuses on predicting discrete time-to-event outcome, taking advantage of repeated measurements. We show that the probability of an event in a time window can be more precisely predicted by incorporating the longitudinal measurements. The model was investigated by comparison with a two-step model and a discrete-time survival model. Results from both a study on the occurrence of tuberculosis and simulated data show that the joint model is superior to the other models in discrimination ability, especially as the latent variables related to both survival times and the longitudinal measurements depart from 0.

[1]  R. Dennis Cook,et al.  Cross-Validation of Regression Models , 1984 .

[2]  P. Diggle An approach to the analysis of repeated measurements. , 1988, Biometrics.

[3]  G. Blattenberger,et al.  Separating the Brier Score into Calibration and Refinement Components: A Graphical Exposition , 1985 .

[4]  Bradley P. Carlin,et al.  A Sample Reuse Method for Accurate Parametric Empirical Bayes Confidence Intervals , 1991 .

[5]  N. Metropolis,et al.  The Monte Carlo method. , 1949 .

[6]  P. Allison Discrete-Time Methods for the Analysis of Event Histories , 1982 .

[7]  John A. Swets,et al.  Evaluation of diagnostic systems : methods from signal detection theory , 1982 .

[8]  L. Tierney,et al.  Accurate Approximations for Posterior Moments and Marginal Densities , 1986 .

[9]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[10]  D. Bates,et al.  Newton-Raphson and EM Algorithms for Linear Mixed-Effects Models for Repeated-Measures Data , 1988 .

[11]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[12]  Mark D Schluchter,et al.  Jointly modelling the relationship between survival and pulmonary function in cystic fibrosis patients , 2002, Statistics in medicine.

[13]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[14]  John B. Willett,et al.  It’s About Time: Using Discrete-Time Survival Analysis to Study Duration and the Timing of Events , 1993 .

[15]  C. R. Henderson Applications of linear models in animal breeding , 1984 .

[16]  M. Abramowitz,et al.  Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[17]  D. Follmann,et al.  An approximate generalized linear model with random effects for informative missing data. , 1995, Biometrics.

[18]  David Bruce Wilson,et al.  Exact sampling with coupled Markov chains and applications to statistical mechanics , 1996, Random Struct. Algorithms.

[19]  A. H. Murphy A New Vector Partition of the Probability Score , 1973 .

[20]  C. Stein,et al.  Genetic Epidemiology of Tuberculosis Susceptibility: Impact of Study Design , 2011, PLoS pathogens.

[21]  E. Nummelin General irreducible Markov chains and non-negative operators: Notes and comments , 1984 .

[22]  C. Morris Parametric Empirical Bayes Inference: Theory and Applications , 1983 .

[23]  M. Stephens EDF Statistics for Goodness of Fit and Some Comparisons , 1974 .

[24]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[25]  Nan M. Laird,et al.  Covariance Analysis of Censored Survival Data Using Log-Linear Analysis Techniques , 1981 .

[26]  Louise Ryan,et al.  Survival Models: Methods for Interval‐Censored Data , 2005 .

[27]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[28]  E. Steyerberg Clinical Prediction Models , 2008, Statistics for Biology and Health.

[29]  Bengt Muthén,et al.  Discrete-Time Survival Mixture Analysis , 2005 .

[30]  Gail Gong Cross-Validation, the Jackknife, and the Bootstrap: Excess Error Estimation in Forward Logistic Regression , 1986 .

[31]  D. Bates,et al.  Nonlinear mixed effects models for repeated measures data. , 1990, Biometrics.

[32]  Peter Green,et al.  Exact sampling for Bayesian inference: towards general purpose algorithms , 1998 .

[33]  R Henderson,et al.  Joint modelling of longitudinal measurements and event time data. , 2000, Biostatistics.

[34]  David V. Hinkley,et al.  Parametric Empirical Bayes Inference: Theory and Applications: Comment , 1983 .

[35]  D. Bates,et al.  Approximations to the Log-Likelihood Function in the Nonlinear Mixed-Effects Model , 1995 .

[36]  Stephen G Walker,et al.  AN EM ALGORITHM FOR NONLINEAR RANDOM EFFECTS MODELS , 1996 .

[37]  M. Radmacher,et al.  Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. , 2003, Journal of the National Cancer Institute.

[38]  George Casella,et al.  Implementations of the Monte Carlo EM Algorithm , 2001 .

[39]  J. Habbema,et al.  Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. , 2001, Journal of clinical epidemiology.

[40]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[41]  W. A. Thompson,et al.  On the treatment of grouped observations in life studies. , 1977, Biometrics.

[42]  Ludwig Fahrmeir,et al.  Dynamic modelling and penalized likelihood estimation for discrete time survival data , 1994 .

[43]  B. Bloom,et al.  Tuberculosis Pathogenesis, Protection, and Control , 1994 .

[44]  Wei Liu,et al.  Analysis of Longitudinal and Survival Data: Joint Modeling, Inference Methods, and Issues , 2012 .

[45]  J. Booth,et al.  Standard Errors of Prediction in Generalized Linear Mixed Models , 1998 .

[46]  Thompson Wa On the treatment of grouped observations in life studies. , 1977 .

[47]  B. Efron Logistic Regression, Survival Analysis, and the Kaplan-Meier Curve , 1988 .

[48]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[49]  Xeni K. Dimakos,et al.  A Guide to Exact Simulation , 2001 .

[50]  Mithat Gonen,et al.  Analyzing Receiver Operating Characteristic Curves with SAS , 2007 .

[51]  Catherine M Stein,et al.  Innate and adaptive immune responses during acute M. tuberculosis infection in adult household contacts in Kampala, Uganda. , 2012, The American journal of tropical medicine and hygiene.

[52]  Jeremy MG Taylor,et al.  Validation of Biomarker-Based Risk Prediction Models , 2008, Clinical Cancer Research.

[53]  M. Wulfsohn,et al.  Modeling the Relationship of Survival to Longitudinal Data Measured with Error. Applications to Survival and CD4 Counts in Patients with AIDS , 1995 .

[54]  M D Schluchter,et al.  Analysis of change in the presence of informative censoring: application to a longitudinal clinical trial of progressive renal disease , 2001, Statistics in medicine.

[55]  N. Cook Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction , 2007, Circulation.

[56]  Weili He,et al.  Use of joint models to assess treatment effects on disease markers and clinical events: the ProscarTM Long-Term Efficacy and Safety Study (PLESS) , 2004, Clinical trials.

[57]  John S. J. Hsu,et al.  Bayesian Marginal Inference , 1989 .

[58]  Russell D. Wolfinger,et al.  Laplace's approximation for nonlinear mixed models. , 1993 .

[59]  D G Altman,et al.  What do we mean by validating a prognostic model? , 2000, Statistics in medicine.

[60]  Robert C. Elston,et al.  Evidence for a Major Gene Influence on Tumor Necrosis Factor-α Expression in Tuberculosis: Path and Segregation Analysis , 2005, Human Heredity.

[61]  M. O. Lorenz,et al.  Methods of Measuring the Concentration of Wealth , 1905, Publications of the American Statistical Association.

[62]  G. Oehlert A note on the delta method , 1992 .

[63]  Jing Wang,et al.  EM algorithms for nonlinear mixed effects models , 2007, Comput. Stat. Data Anal..

[64]  Joseph W Hogan,et al.  Handling drop‐out in longitudinal studies , 2004, Statistics in medicine.

[65]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[66]  Cheryl L. Thompson,et al.  Genome Scan of M. tuberculosis Infection and Disease in Ugandans , 2008, PloS one.

[67]  Michael C. Neale,et al.  Methodology for Genetic Studies of Twins and Families , 1992 .

[68]  Marie Davidian,et al.  Smooth nonparametric maximum likelihood estimation for population pharmacokinetics, with application to quinidine , 1992, Journal of Pharmacokinetics and Biopharmaceutics.

[69]  J. Ware,et al.  Random-effects models for longitudinal data. , 1982, Biometrics.

[70]  Joseph G Ibrahim,et al.  Basic concepts and methods for joint models of longitudinal and survival data. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[71]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[72]  Lewis B. Sheiner,et al.  Evaluation of methods for estimating population pharmacokinetic parameters. I. Michaelis-menten model: Routine clinical pharmacokinetic data , 1980, Journal of Pharmacokinetics and Biopharmaceutics.

[73]  D. Pauler,et al.  Predicting time to prostate cancer recurrence based on joint models for non‐linear longitudinal biomarkers and event time outcomes , 2002, Statistics in medicine.

[74]  Robert C. Elston,et al.  Linkage and association analysis of candidate genes for TB and TNFα cytokine expression: evidence for association with IFNGR1, IL-10, and TNF receptor 1 genes , 2007, Human Genetics.

[75]  C. Borror Generalized Linear Models and Extensions, Second Edition , 2008 .

[76]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[77]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[78]  D. Dorfman,et al.  Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals—Rating-method data , 1969 .

[79]  Peter Bajorski,et al.  Wiley Series in Probability and Statistics , 2010 .

[80]  J C Lindsey,et al.  Tutorial in biostatistics methods for interval-censored data. , 1998, Statistics in medicine.

[81]  Bradley P. Carlin,et al.  Approaches for Empirical Bayes Confidence Intervals , 1990 .

[82]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[83]  Hemant K Tiwari,et al.  Heritability analysis of cytokines as intermediate phenotypes of tuberculosis. , 2003, The Journal of infectious diseases.

[84]  Paul F. Pinsky,et al.  Scaling of True and Apparent ROC AUC with Number of Observations and Number of Variables , 2005 .

[85]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[86]  M D Schluchter,et al.  Methods for the analysis of informatively censored longitudinal data. , 1992, Statistics in medicine.

[87]  Peter Gedeck,et al.  Leave-Cluster-Out Cross-Validation Is Appropriate for Scoring Functions Derived from Diverse Protein Data Sets , 2010, J. Chem. Inf. Model..

[88]  Donald Hedeker,et al.  Application of random-efiects pattern-mixture models for miss-ing data in longitudinal studies , 1997 .

[89]  V. Caron,et al.  United states. , 2018, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[90]  George Casella,et al.  Empirical Bayes Estimation for Logistic Regression and Extended Parametric Regression Models , 1996 .

[91]  Brian D. Ripley,et al.  Stochastic Simulation , 2005 .

[92]  J. Geweke,et al.  Bayesian Inference in Econometric Models Using Monte Carlo Integration , 1989 .

[93]  H. D. Patterson,et al.  Recovery of inter-block information when block sizes are unequal , 1971 .

[94]  P. Diggle,et al.  Analysis of Longitudinal Data , 2003 .

[95]  E. Vonesh,et al.  A note on the use of Laplace's approximation for nonlinear mixed-effects models , 1996 .

[96]  R. Tibshirani,et al.  Monographs on statistics and applied probability , 1990 .

[97]  Mark D Schluchter,et al.  Shared parameter models for the joint analysis of longitudinal data and event times , 2006, Statistics in medicine.

[98]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[99]  D. Collet Modelling Survival Data in Medical Research , 2004 .

[100]  Marie Davidian,et al.  Nonlinear models for repeated measurement data: An overview and update , 2003 .

[101]  T. Louis,et al.  Empirical Bayes Confidence Intervals Based on Bootstrap Samples , 1987 .

[102]  Thomas A. Gerds,et al.  The Validation and Assessment of Machine Learning: A Game of Prediction from High-Dimensional Data , 2009, PloS one.

[103]  S. Zeger,et al.  Joint analysis of longitudinal data comprising repeated measures and times to events , 2001 .

[104]  R. Prentice,et al.  Regression analysis of grouped survival data with application to breast cancer data. , 1978, Biometrics.

[105]  Hsun-Chih Kuo,et al.  The joint model of the logistic model and linear random effect model - An application to predict orthostatic hypertension for subacute stroke patients , 2011, Comput. Stat. Data Anal..

[106]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[107]  John D Kalbfleisch,et al.  Mixed Discrete and Continuous Cox Regression Model , 2003, Lifetime data analysis.

[108]  Geert Molenberghs,et al.  Missing Data in Clinical Studies , 2007 .

[109]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[110]  D. Guwatudde,et al.  Tuberculosis in household contacts of infectious cases in Kampala, Uganda. , 2003, American journal of epidemiology.

[111]  D.,et al.  Regression Models and Life-Tables , 2022 .

[112]  C. Brown On the use of indicator variables for studying the time-dependence of parameters in a response-time model. , 1975, Biometrics.

[113]  M. Wulfsohn,et al.  A joint model for survival and longitudinal data measured with error. , 1997, Biometrics.