Handling drop‐out in longitudinal studies

Drop-out is a prevalent complication in the analysis of data from longitudinal studies, and remains an active area of research for statisticians and other quantitative methodologists. This tutorial is designed to synthesize and illustrate the broad array of techniques that are used to address outcome-related drop-out, with emphasis on regression-based methods. We begin with a review of important assumptions underlying likelihood-based and semi-parametric models, followed by an overview of models and methods used to draw inferences from incomplete longitudinal data. The majority of the tutorial is devoted to detailed analysis of two studies with substantial rates of drop-out, designed to illustrate the use of effective methods that are relatively easy to apply: in the first example, we use both semi-parametric and fully parametric models to analyse repeated binary responses from a clinical trial of smoking cessation interventions; in the second, pattern mixture models are used to analyse longitudinal CD4 counts from an observational cohort study of HIV-infected women. In each example, we describe exploratory analyses, model formulation, estimation methodology and interpretation of results. Analyses of incomplete data requires making unverifiable assumptions, and these are discussed in detail within the context of each application. Relevant SAS code is provided.

[1]  V. De Gruttola,et al.  Modelling progression of CD4-lymphocyte count and its relationship to survival time. , 1994, Biometrics.

[2]  D. Hosmer,et al.  Goodness of fit tests for the multiple logistic regression model , 1980 .

[3]  D Scharfstein,et al.  Methods for Conducting Sensitivity Analysis of Trials with Potentially Nonignorable Competing Causes of Censoring , 2001, Biometrics.

[4]  Joseph W. Hogan,et al.  Clinical and Immunologic Progression in HIV‐Infected US Women Before and After the Introduction of Highly Active Antiretroviral Therapy , 2003, Journal of acquired immune deficiency syndromes.

[5]  N. Breslow,et al.  Approximate inference in generalized linear mixed models , 1993 .

[6]  K. Bailey,et al.  Estimation and comparison of changes in the presence of informative right censoring: conditional linear model. , 1989, Biometrics.

[7]  N M Laird,et al.  Mixture models for the joint distribution of repeated measures and event times. , 1997, Statistics in medicine.

[8]  Geert Molenberghs,et al.  Monotone missing data and pattern‐mixture models , 1998 .

[9]  E Lichtenstein,et al.  Smoking cessation: what have we learned over the past decade? , 1992, Journal of consulting and clinical psychology.

[10]  N. Laird,et al.  A likelihood-based method for analysing longitudinal binary responses , 1993 .

[11]  Geert Molenberghs,et al.  Strategies to fit pattern-mixture models. , 2002, Biostatistics.

[12]  James M. Robins,et al.  Sequential models for coarsening and missingness , 1997 .

[13]  R. Little,et al.  Pattern-mixture models for multivariate incomplete data with covariates. , 1996, Biometrics.

[14]  J. Robins,et al.  Marginal Structural Models to Estimate the Joint Causal Effect of Nonrandomized Treatments , 2001 .

[15]  J W Hogan,et al.  Reparameterizing the Pattern Mixture Model for Sensitivity Analyses Under Informative Dropout , 2000, Biometrics.

[16]  R F Woolson,et al.  Application of empirical Bayes inference to estimation of rate of change in the presence of informative right censoring. , 1992, Statistics in medicine.

[17]  Ana Ivelisse Avilés,et al.  Linear Mixed Models for Longitudinal Data , 2001, Technometrics.

[18]  M J Daniels,et al.  Meta-analysis for the evaluation of potential surrogate markers. , 1997, Statistics in medicine.

[19]  N M Laird,et al.  Model-based approaches to analysing incomplete longitudinal and failure time data. , 1997, Statistics in medicine.

[20]  D. Rubin,et al.  Addressing complications of intention-to-treat analysis in the combined presence of all-or-none treatment-noncompliance and subsequent missing outcomes , 1999 .

[21]  N M Laird,et al.  Maximum likelihood analysis of generalized linear models with missing covariates , 1999, Statistical methods in medical research.

[22]  G Molenberghs,et al.  Sensitivity Analysis for Nonrandom Dropout: A Local Influence Approach , 2001, Biometrics.

[23]  Andrea Rotnitzky,et al.  Regression Models for Discrete Longitudinal Responses , 1993 .

[24]  D. Follmann,et al.  An approximate generalized linear model with random effects for informative missing data. , 1995, Biometrics.

[25]  James M. Robins,et al.  Semiparametric Regression for Repeated Outcomes With Nonignorable Nonresponse , 1998 .

[26]  J. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models , 1999 .

[27]  Roderick J. A. Little,et al.  A Class of Pattern-Mixture Models for Normal Incomplete Data , 1994 .

[28]  J. Copas,et al.  Inference for Non‐random Samples , 1997 .

[29]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[30]  Q E Whiting-O'Keefe,et al.  Controlled clinical trials. , 1983, The American journal of medicine.

[31]  Zhiliang Ying,et al.  Semiparametric and Nonparametric Regression Analysis of Longitudinal Data , 2001 .

[32]  R Little,et al.  Intent-to-treat analysis for longitudinal studies with drop-outs. , 1996, Biometrics.

[33]  M D Schluchter,et al.  Methods for the analysis of informatively censored longitudinal data. , 1992, Statistics in medicine.

[34]  Jason Roy,et al.  Analysis of Multivariate Longitudinal Outcomes With Nonignorable Dropouts and Missing Covariates , 2002 .

[35]  N M Laird,et al.  Analysing incomplete longitudinal binary responses: a likelihood-based approach. , 1994, Biometrics.

[36]  M S Pepe,et al.  Surrogate and auxiliary endpoints in clinical trials, with potential applications in cancer and AIDS research. , 1994, Statistics in medicine.

[37]  N M Laird,et al.  Intention-to-treat analyses for incomplete repeated measures data. , 1996, Biometrics.

[38]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[39]  S. Lipsitz,et al.  Quantile Regression Methods for Longitudinal Data with Drop‐outs: Application to CD4 Cell Counts of Patients Infected with the Human Immunodeficiency Virus , 1997 .

[40]  Joseph W Hogan,et al.  Estimating Causal Treatment Effects from Longitudinal HIV Natural History Studies Using Marginal Structural Models , 2003, Biometrics.

[41]  James M. Robins,et al.  Association, Causation, And Marginal Structural Models , 1999, Synthese.

[42]  Roderick J. A. Little,et al.  Modeling the Drop-Out Mechanism in Repeated-Measures Studies , 1995 .

[43]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[44]  M. Kenward,et al.  The analysis of longitudinal ordinal data with nonrandom drop-out , 1997 .

[45]  D. Hedeker,et al.  A random-effects ordinal regression model for multilevel analysis. , 1994, Biometrics.

[46]  R. Little,et al.  Inference for the Complier-Average Causal Effect From Longitudinal Data Subject to Noncompliance and Missing Data, With Application to a Job Training Assessment for the Unemployed , 2001 .

[47]  M. Wulfsohn,et al.  A joint model for survival and longitudinal data measured with error. , 1997, Biometrics.

[48]  James M. Robins,et al.  Unified Methods for Censored Longitudinal Data and Causality , 2003 .

[49]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[50]  J. Pearl,et al.  Confounding and Collapsibility in Causal Inference , 1999 .

[51]  T. Lancaster,et al.  Panel Data with Survival: Hospitalization of HIV-Positive Patients , 1998 .

[52]  Scott L. Zeger,et al.  Latent Variable Model for Joint Analysis of Multiple Repeated Measures and Bivariate Event Times , 2001 .

[53]  Mark R. Conaway,et al.  The Analysis of Repeated Categorical Measurements Subject to Nonignorable Nonresponse , 1992 .

[54]  P S Albert,et al.  Modeling Repeated Count Data Subject to Informative Dropout , 2000, Biometrics.

[55]  G Molenberghs,et al.  Parametric models for incomplete continuous and categorical longitudinal data , 1999, Statistical methods in medical research.

[56]  N M Laird,et al.  Increasing efficiency from censored survival data by using random effects to model longitudinal covariates , 1998, Statistical methods in medical research.

[57]  N M Laird,et al.  Generalized linear mixture models for handling nonignorable dropouts in longitudinal studies. , 2000, Biostatistics.

[58]  Lee-Jen Wei,et al.  Rank estimation of treatment differences based on repeated measurements subject to dependent censoring , 1999 .

[59]  K Y Liang,et al.  An overview of methods for the analysis of longitudinal data. , 1992, Statistics in medicine.

[60]  M. Kenward Selection models for repeated measurements with non-random dropout: an illustration of sensitivity. , 1998, Statistics in medicine.

[61]  P. Diggle An approach to the analysis of repeated measurements. , 1988, Biometrics.

[62]  D. Rubin Formalizing Subjective Notions about the Effect of Nonrespondents in Sample Surveys , 1977 .

[63]  J. Robins,et al.  Estimation of the Causal Effect of a Time-Varying Exposure on the Marginal Mean of a Repeated Binary Outcome , 1999 .

[64]  E J Goetghebeur,et al.  Analysing non-compliance in clinical trials: ethical imperative or mission impossible? , 1996, Statistics in medicine.

[65]  Donald B. Rubin,et al.  Multiple imputation in mixture models for nonignorable nonresponse with follow-ups , 1993 .

[66]  C. Meinert Beyond CONSORT: need for improved reporting standards for clinical trials. Consolidated Standards of Reporting Trials. , 1998, JAMA.

[67]  Patrick J Heagerty,et al.  Marginalized Transition Models and Likelihood Inference for Longitudinal Categorical Data , 2002, Biometrics.

[68]  Raymond J. Carroll,et al.  Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process , 1988 .

[69]  Joseph W. Hogan,et al.  Analysis of incomplete repeated measurements with dependent censoring times , 1998 .

[70]  J. Richard Landis,et al.  Model for the Analysis of Binary Longitudinal Pain Data Subject to Informative Dropout through Remedication , 1998 .

[71]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[72]  Roderick J. A. Little,et al.  A test of missing completely at random for generalised estimating equations with missing data , 1999 .

[73]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[74]  Robert Tibshirani,et al.  Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy , 1986 .

[75]  K Y Liang,et al.  Longitudinal data analysis for discrete and continuous outcomes. , 1986, Biometrics.

[76]  Nan M. Laird,et al.  Multivariate Logistic Models for Incomplete Binary Responses , 1996 .

[77]  D. Thomas,et al.  Simultaneously modelling censored survival data and repeatedly measured covariates: a Gibbs sampling approach. , 1996, Statistics in medicine.

[78]  Stuart R. Lipsitz,et al.  Marginal models for the analysis of longitudinal measurements with nonignorable non-monotone missing data , 1998 .

[79]  J. Heckman Sample selection bias as a specification error , 1979 .

[80]  Belinda Borrelli,et al.  Predictors of quitting and dropout among women in a clinic-based smoking cessation program. , 2002, Psychology of addictive behaviors : journal of the Society of Psychologists in Addictive Behaviors.

[81]  J. Ibrahim Incomplete Data in Generalized Linear Models , 1990 .

[82]  R Henderson,et al.  Joint modelling of longitudinal measurements and event time data. , 2000, Biostatistics.

[83]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[84]  A. Goldman An Introduction to Regression Graphics , 1995 .

[85]  P. Diggle Analysis of Longitudinal Data , 1995 .

[86]  Garrett M. Fitzmaurice,et al.  Methods for Handling Dropouts in Longitudinal Clinical Trials , 2003 .

[87]  R. Little Pattern-Mixture Models for Multivariate Incomplete Data , 1993 .

[88]  K. Bailey,et al.  Analysing changes in the presence of informative right censoring caused by death and withdrawal. , 1988, Statistics in medicine.