When data are not missing at random: implications for measuring health conditions in the Behavioral Risk Factor Surveillance System

Objectives To examine the effect on estimated levels of health conditions produced from large-scale surveys, when either list-wise respondent deletion or standard demographic item-level imputation is employed. To assess the degree to which further bias reduction results from the inclusion of correlated ancillary variables in the item imputation process. Design Large cross-sectional (US level) household survey. Participants 218 726 US adults (18 years and older) in the 2006 Behavioral Risk Factor Surveillance System Survey. This survey is the largest US telephone survey conducted by the Centers for Disease Control and Prevention. Primary and secondary outcome measures Estimated rates of severe depression among US adults. Results The use of list-wise respondent deletion and/or demographic imputation results in the underestimation of severe depression among adults in the USA. List-wise deletion produces underestimates of 9% (8.7% vs 9.5%). Demographic imputation produces underestimates of 7% (8.9% vs 9.5%). Both of these differences are significant at the 0.05 level. Conclusion The use of list-wise deletion and/or demographic-only imputation may produce significant distortion in estimating national levels of certain health conditions.

[1]  T. Raghunathan,et al.  Multiple Imputation of Family Income and Personal Earnings in the National Health Interview Survey: Methods and Examples , 2008 .

[2]  Kosuke Imai,et al.  Survey Sampling , 1998, Nov/Dec 2017.

[3]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[4]  Chris Moriarity,et al.  Design and estimation for the national health interview survey, 2006-2015. , 2014, Vital and health statistics. Series 2, Data evaluation and methods research.

[5]  Jack R. Anderson Design and estimation for the National Health Interview Survey, 1995-2004. , 2000, Vital and health statistics. Series 2, Data evaluation and methods research.

[6]  E. Korn,et al.  Analysis of Health Surveys: Korn/Analysis , 1999 .

[7]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[8]  Ting Hsiang Lin,et al.  Missing Data Imputation in Quality-of-Life Assessment , 2012, PharmacoEconomics.

[9]  Gabriele B. Durrant Imputation methods for handling item‐nonresponse in practice: methodological issues and recent debates , 2009 .

[10]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[11]  J. Schafer,et al.  Analysis of incomplete multivariate data / J.L. Schafer , 1997 .

[12]  Hude Quan,et al.  Bmc Medical Research Methodology Open Access Dealing with Missing Data in a Multi-question Depression Scale: a Comparison of Imputation Methods , 2022 .

[13]  R. Spitzer,et al.  Validation and utility of a self-report version of PRIME-MD: the PHQ primary care study. Primary Care Evaluation of Mental Disorders. Patient Health Questionnaire. , 1999, JAMA.

[14]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[15]  Patrick J. Cantwell,et al.  Imputation, Apportionment, and Statistical Methods in the U.S. Census: Issues Surrounding Utah v. Evans , 2005 .

[16]  Macros and Tools for Working with Weighted Logistic Regression Models That Use Survey Data , 2002 .

[17]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[18]  Patricia A. Berglund,et al.  An Introduction to Multiple Imputation of Complex Sample Data using SAS® v9.2 , 2010 .

[19]  Graham Kalton,et al.  Compensating for missing survey data , 1982 .

[20]  Natalie Shlomo Statistical disclosure control methods for census frequency tables , 2007 .