Multiple Imputation to Account for Missing Data in a Survey: Estimating the Prevalence of Osteoporosis

Background. Nonresponse bias is a concern in any epidemiologic survey in which a subset of selected individuals declines to participate. Methods. We reviewed multiple imputation, a widely applicable and easy to implement Bayesian methodology to adjust for nonresponse bias. To illustrate the method, we used data from the Canadian Multicentre Osteoporosis Study, a large cohort study of 9423 randomly selected Canadians, designed in part to estimate the prevalence of osteoporosis. Although subjects were randomly selected, only 42% of individuals who were contacted agreed to participate fully in the study. The study design included a brief questionnaire for those invitees who declined further participation in order to collect information on the major risk factors for osteoporosis. These risk factors (which included age, sex, previous fractures, family history of osteoporosis, and current smoking status) were then used to estimate the missing osteoporosis status for nonparticipants using multiple imputation. Both ignorable and nonignorable imputation models are considered. Results. Our results suggest that selection bias in the study is of concern, but only slightly, in very elderly (age 80+ years), both women and men. Conclusions. Epidemiologists should consider using multiple imputation more often than is current practice.

[1]  Jacques P. Brown,et al.  Research Notes: The Canadian Multicentre Osteoporosis Study (CaMos): Background, Rationale, Methods , 1999, Canadian Journal on Aging / La Revue canadienne du vieillissement.

[2]  Walter R. Gilks,et al.  BUGS - Bayesian inference Using Gibbs Sampling Version 0.50 , 1995 .

[3]  I. Kurane,et al.  Analysis of Japanese encephalitis epidemic in western Nepal in 1997 , 2001, Epidemiology and Infection.

[4]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[5]  Stephen J. Ganocy,et al.  Bayesian Statistical Modelling , 2002, Technometrics.

[6]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[7]  J. Q. Smith,et al.  1. Bayesian Statistics 4 , 1993 .

[8]  L Sheppard,et al.  Effects of ambient air pollution on nonelderly asthma hospital admissions in Seattle, Washington, 1987-1994. , 1999, Epidemiology.

[9]  R A Betensky,et al.  Multiple imputation for simple estimation of the hazard function based on interval censored data. , 2000, Statistics in medicine.

[10]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[11]  A. Raftery,et al.  How Many Iterations in the Gibbs Sampler , 1991 .

[12]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[13]  Xiao-Li Meng,et al.  Applications of multiple imputation in medical studies: from AIDS to NHANES , 1999, Statistical methods in medical research.

[14]  H. Carabin,et al.  Comparison of methods to analyse imprecise faecal coliform count data from environmental samples , 2001, Epidemiology and Infection.

[15]  A. Raftery Bayesian Model Selection in Social Research , 1995 .

[16]  L Wu,et al.  A multiple imputation method for missing covariates in non‐linear mixed‐effects models with application to HIV dynamics , 2001, Statistics in medicine.

[17]  A. Tenenhouse,et al.  Estimation of the Prevalence of Low Bone Density in Canadian Women and Men Using a Population-Specific DXA Reference Standard: The Canadian Multicentre Osteoporosis Study (CaMos) , 2000, Osteoporosis International.

[18]  L. Joseph,et al.  Placing trials in context using Bayesian analysis. GUSTO revisited by Reverend Bayes. , 1995, JAMA.

[19]  Steven Goodman Toward Evidence-Based Medical Statistics. 2: The Bayes Factor , 1999, Annals of Internal Medicine.

[20]  Heather Fry,et al.  A user’s guide , 2003 .