Missing data and multiple imputation in clinical epidemiological research

Missing data are ubiquitous in clinical epidemiological research. Individuals with missing data may differ from those with no missing data in terms of the outcome of interest and prognosis in general. Missing data are often categorized into the following three types: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). In clinical epidemiological research, missing data are seldom MCAR. Missing data can constitute considerable challenges in the analyses and interpretation of results and can potentially weaken the validity of results and conclusions. A number of methods have been developed for dealing with missing data. These include complete-case analyses, missing indicator method, single value imputation, and sensitivity analyses incorporating worst-case and best-case scenarios. If applied under the MCAR assumption, some of these methods can provide unbiased but often less precise estimates. Multiple imputation is an alternative method to deal with missing data, which accounts for the uncertainty associated with missing data. Multiple imputation is implemented in most statistical software under the MAR assumption and provides unbiased and valid estimates of associations based on information from the available data. The method affects not only the coefficient estimates for variables with missing data but also the estimates for other variables with no missing data.

[1]  John Brodersen,et al.  Patient‐reported outcomes: measurement, implementation and interpretation , 2015 .

[2]  M. Kenward,et al.  Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls , 2009, BMJ : British Medical Journal.

[3]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[4]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[5]  S Greenland,et al.  A critical look at methods for handling missing covariates in epidemiologic regression analyses. , 1995, American journal of epidemiology.

[6]  S. van Buuren Multiple imputation of discrete and continuous data by fully conditional specification , 2007, Statistical methods in medical research.

[7]  S. Overgaard,et al.  Risk factors for venous thromboembolism in patients undergoing total hip replacement and receiving routine thromboprophylaxis. , 2010, The Journal of bone and joint surgery. American volume.

[8]  J. Carpenter,et al.  Issues in multiple imputation of missing data for large general practice clinical databases , 2010, Pharmacoepidemiology and drug safety.

[9]  Jin Tian,et al.  Graphical Models for Inference with Missing Data , 2013, NIPS.

[10]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[11]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[12]  B. Espehaug,et al.  Weight gain and the risk of total hip replacement a population-based prospective cohort study of 265,725 individuals. , 2011, Osteoarthritis and cartilage.

[13]  Patrick Royston,et al.  Multiple imputation using chained equations: Issues and guidance for practice , 2011, Statistics in medicine.

[14]  Madhukar H. Trivedi,et al.  Prevention of Missing Data in Clinical Research Studies , 2006, Biological Psychiatry.

[15]  T. Stijnen,et al.  Review: a gentle introduction to imputation of missing values. , 2006, Journal of clinical epidemiology.

[16]  Michael G. Kenward,et al.  Multiple Imputation and its Application , 2013 .

[17]  Matthias Egger,et al.  The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: Guidelines for Reporting Observational Studies , 2007, PLoS medicine.

[18]  J. Graham,et al.  Missing data analysis: making it work in the real world. , 2009, Annual review of psychology.

[19]  J. Graham,et al.  How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory , 2007, Prevention Science.

[20]  Todd E. Bodner,et al.  What Improves with Increased Missing Data Imputations? , 2008 .

[21]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[22]  D. Schroeder,et al.  Missing data assumptions and methods in a smoking cessation study. , 2010, Addiction.

[23]  S. Pocock,et al.  The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. , 2007, Preventive medicine.

[24]  Douglas G Altman,et al.  [The Strengthening the Reporting of Observational Studies in Epidemiology [STROBE] statement: guidelines for reporting observational studies]. , 2007, Gaceta sanitaria.

[25]  Theo Stijnen,et al.  Using the outcome for imputation of missing predictor values was preferred. , 2006, Journal of clinical epidemiology.

[26]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[27]  Stephen R Cole,et al.  Use of multiple imputation in the epidemiologic literature. , 2008, American journal of epidemiology.

[28]  D. Rubin,et al.  Multiple Imputation for Nonresponse in Surveys , 1989 .

[29]  E. Mikkelsen,et al.  Perceived stress and risk of any osteoporotic fracture , 2016, Osteoporosis International.