Unpredictable bias when using the missing indicator method or complete case analysis for missing confounder values: an empirical example.

OBJECTIVE Missing indicator method (MIM) and complete case analysis (CC) are frequently used to handle missing confounder data. Using empirical data, we demonstrated the degree and direction of bias in the effect estimate when using these methods compared with multiple imputation (MI). STUDY DESIGN AND SETTING From a cohort study, we selected an exposure (marital status), outcome (depression), and confounders (age, sex, and income). Missing values in "income" were created according to different patterns of missingness: missing values were created completely at random and depending on exposure and outcome values. Percentages of missing values ranged from 2.5% to 30%. RESULTS When missing values were completely random, MIM gave an overestimation of the odds ratio, whereas CC and MI gave unbiased results. MIM and CC gave under- or overestimations when missing values depended on observed values. Magnitude and direction of bias depended on how the missing values were related to exposure and outcome. Bias increased with increasing percentage of missing values. CONCLUSION MIM should not be used in handling missing confounder data because it gives unpredictable bias of the odds ratio even with small percentages of missing values. CC can be used when missing values are completely random, but it gives loss of statistical power.

[1]  Ian R White,et al.  Adjusting for partially missing baseline measurements in randomized trials , 2005, Statistics in medicine.

[2]  Alexander Basilevsky,et al.  Chapter 12 – Missing Data: A Review of the Literature , 1983 .

[3]  D G Altman,et al.  Missing covariate data within cancer prognostic studies: a review of current reporting and proposed guidelines , 2004, British Journal of Cancer.

[4]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[5]  S. Crawford,et al.  A comparison of anlaytic methods for non-random missingness of outcome data. , 1995, Journal of clinical epidemiology.

[6]  Ken P Kleinman,et al.  Much Ado About Nothing , 2007, The American statistician.

[7]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[8]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[9]  W Vach,et al.  Biased estimation of the odds ratio in case-control studies due to the use of ad hoc methods of correcting for missing values for confounding variables. , 1991, American journal of epidemiology.

[10]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[11]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[12]  A Rogier T Donders,et al.  Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example. , 2006, Journal of clinical epidemiology.

[13]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[14]  T. Stijnen,et al.  Review: a gentle introduction to imputation of missing values. , 2006, Journal of clinical epidemiology.

[15]  James D. Wright,et al.  Handbook of Survey Research. , 1985 .

[16]  D B Rubin,et al.  Multiple imputation in health-care databases: an overview and some applications. , 1991, Statistics in medicine.

[17]  S Greenland,et al.  A critical look at methods for handling missing covariates in epidemiologic regression analyses. , 1995, American journal of epidemiology.

[18]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[19]  S. Pocock,et al.  The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies , 2007, The Lancet.

[20]  M. Gorelick,et al.  Bias arising from missing data in predictive models. , 2006, Journal of clinical epidemiology.

[21]  M. Tucker,et al.  Gene-covariate interaction between dysplastic nevi and the CDKN2A gene in American melanoma-prone families. , 2000, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[22]  Michael P. Jones Indicator and stratification methods for missing explanatory variables in multiple linear regression , 1996 .

[23]  Theo Stijnen,et al.  Using the outcome for imputation of missing predictor values was preferred. , 2006, Journal of clinical epidemiology.

[24]  R. Kronmal,et al.  Multiple imputation of baseline data in the cardiovascular health study. , 2003, American journal of epidemiology.

[25]  Jan Neeleman,et al.  Prediction of depression in European general practice attendees: the PREDICT study , 2006, BMC public health.

[26]  A. Farmer,et al.  The Composite International Diagnostic Interview. An epidemiologic Instrument suitable for use in conjunction with different diagnostic systems and in different cultures. , 1988, Archives of general psychiatry.

[27]  S. Pocock,et al.  Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): Explanation and Elaboration , 2007, Epidemiology.

[28]  A. Egberts,et al.  Depressive Symptoms in Subjects With Diagnosed and Undiagnosed Type 2 Diabetes , 2007, Psychosomatic medicine.

[29]  S. van Buuren,et al.  Flexible mutlivariate imputation by MICE , 1999 .