A comparison of imputation methods in a longitudinal randomized clinical trial

It is common for longitudinal clinical trials to face problems of item non-response, unit non-response, and drop-out. In this paper, we compare two alternative methods of handling multivariate incomplete data across a baseline assessment and three follow-up time points in a multi-centre randomized controlled trial of a disease management programme for late-life depression. One approach combines hot-deck (HD) multiple imputation using a predictive mean matching method for item non-response and the approximate Bayesian bootstrap for unit non-response. A second method is based on a multivariate normal (MVN) model using PROC MI in SAS software V8.2. These two methods are contrasted with a last observation carried forward (LOCF) technique and available-case (AC) analysis in a simulation study where replicate analyses are performed on subsets of the originally complete cases. Missing-data patterns were simulated to be consistent with missing-data patterns found in the originally incomplete cases, and observed complete data means were taken to be the targets of estimation. Not surprisingly, the LOCF and AC methods had poor coverage properties for many of the variables evaluated. Multiple imputation under the MVN model performed well for most variables but produced less than nominal coverage for variables with highly skewed distributions. The HD method consistently produced close to nominal coverage, with interval widths that were roughly 7 per cent larger on average than those produced from the MVN model.

[1]  R. Little Missing-Data Adjustments in Large Surveys , 1988 .

[2]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[3]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[4]  Mark T Hegel,et al.  Collaborative care management of late-life depression in the primary care setting: a randomized controlled trial. , 2002, JAMA.

[5]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[6]  M Y Hu,et al.  Performance of a general location model with an ignorable missing-data assumption in a multivariate mental health services study. , 1999, Statistics in medicine.

[7]  Nicholas J. Horton,et al.  Multiple Imputation in Practice , 2001 .

[8]  M. Hegel,et al.  Improving Primary Care for Depression in Late Life: The Design of a Multicenter Randomized Trial , 2001, Medical care.

[9]  R. Littell SAS System for Mixed Models , 1996 .

[10]  Xiao-Li Meng,et al.  Applications of multiple imputation in medical studies: from AIDS to NHANES , 1999, Statistical methods in medical research.

[11]  P W Lavori,et al.  A multiple imputation strategy for clinical trials with truncation of patient data. , 1995, Statistics in medicine.

[12]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[13]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[14]  L. Covi,et al.  SCL-90: an outpatient psychiatric rating scale--preliminary report. , 1973, Psychopharmacology bulletin.

[15]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[16]  R Little,et al.  Intent-to-treat analysis for longitudinal studies with drop-outs. , 1996, Biometrics.

[17]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[18]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[19]  Jürgen Unützer,et al.  Depression Treatment in a Sample of 1,801 Depressed Older Adults in Primary Care , 2003, Journal of the American Geriatrics Society.

[20]  Thomas R Belin,et al.  Imputation for incomplete high‐dimensional multivariate normal data using a common factor model , 2004, Statistics in medicine.

[21]  Donald B. Rubin,et al.  Multiple Imputation of Industry and Occupation Codes in Census Public-use Samples Using Bayesian Logistic Regression , 1991 .

[22]  J. Schafer Multiple imputation: a primer , 1999, Statistical methods in medical research.