In this issue of JAMA, Asch et al1 report results of a cluster-randomized clinical trial designed to evaluate the effects of physician financial incentives, patient incentives, or shared physician and patient incentives on low density lipoprotein cholesterol (LDL-C) levels among patients with high cardiovascular risk. Because 1 or more follow-up LDL-C measurements were missing for approximately 7% of participants, Asch et al used multiple imputation (MI) to analyze their data and concluded that shared financial incentives for physicians and patients, but not incentives to physicians or patients alone, resulted in the patients having lower LDL-C levels. Imputation is the process of replacing missing data with 1 or more specific values, to allow statistical analysis that includes all participants and not just those who do not have any missing data.
Missing data are common in research. In a previous JAMA Guide to Statistics and Methods, Newgard and Lewis2 reviewed the causes of missing data. These are divided into 3 classes: 1) missing completely at random, the most restrictive assumption, indicating that whether a data point is missing is completely unrelated to observed and unobserved data; 2) missing at random, a more realistic assumption than missing completely at random, indicating whether a data point is missing can be explained by the observed data; or 3) missing not at random, meaning that the missingness is dependent on the unobserved values. Common statistical methods used for handling missing values were reviewed.2 When missing data occur, it is important to not exclude cases with missing information (analyses after such exclusion are known as complete case analyses). Single-value imputation methods are those that estimate what each missing value might have been and replace it with a single value in the data set. Single-value imputation methods include mean imputation, last observation carried forward, and random imputation. These approaches can yield biased results and are suboptimal. Multiple imputation better handles missing data by estimating and replacing missing values many times.
[1]
David B. Allison,et al.
Missing Data in Randomized Clinical Trials for Weight Loss: Scope of the Problem, State of the Field, and Performance of Statistical Methods
,
2009,
PloS one.
[2]
James B. Jones,et al.
Effect of Financial Incentives to Physicians, Patients, or Both on Lipid Levels: A Randomized Clinical Trial.
,
2015,
JAMA.
[3]
Jerome P. Reiter,et al.
MULTIPLE IMPUTATION FOR SHARING PRECISE GEOGRAPHIES IN PUBLIC USE DATA.
,
2012,
The annals of applied statistics.
[4]
C. Newgard,et al.
Missing Data: How to Best Account for What Is Not Known.
,
2015,
JAMA.
[5]
Joseph L Schafer,et al.
Analysis of Incomplete Multivariate Data
,
1997
.
[6]
David B. Allison,et al.
Double Sampling with Multiple Imputation to Answer Large Sample Meta-Research Questions: Introduction and Illustration by Evaluating Adherence to Two Simple CONSORT Guidelines
,
2015,
Front. Nutr..
[7]
D. Allison,et al.
Multiple Imputation to Correct for Measurement Error in Admixture Estimates in Genetic Structured Association Testing
,
2009,
Human Heredity.
[8]
Roger A. Sugden,et al.
Multiple Imputation for Nonresponse in Surveys
,
1988
.
[9]
Patrick Royston,et al.
Multiple imputation using chained equations: Issues and guidance for practice
,
2011,
Statistics in medicine.