Improving on analyses of self‐reported data in a large‐scale health survey by using information from an examination‐based survey

Common data sources for assessing the health of a population of interest include large-scale surveys based on interviews that often pose questions requiring a self-report, such as, 'Has a doctor or other health professional ever told you that you have health condition of interest?' or 'What is your (height/weight)?' Answers to such questions might not always reflect the true prevalences of health conditions (for example, if a respondent misreports height/weight or does not have access to a doctor or other health professional). Such 'measurement error' in health data could affect inferences about measures of health and health disparities. Drawing on two surveys conducted by the National Center for Health Statistics, this paper describes an imputation-based strategy for using clinical information from an examination-based health survey to improve on analyses of self-reported data in a larger interview-based health survey. Models predicting clinical values from self-reported values and covariates are fitted to data from the National Health and Nutrition Examination Survey (NHANES), which asks self-report questions during an interview component and also obtains clinical measurements during a physical examination component. The fitted models are used to multiply impute clinical values for the National Health Interview Survey (NHIS), a larger survey that obtains data solely via interviews. Illustrations involving hypertension, diabetes, and obesity suggest that estimates of health measures based on the multiply imputed clinical values are different from those based on the NHIS self-reported data alone and have smaller estimated standard errors than those based solely on the NHANES clinical data. The paper discusses the relationship of the methods used in the study to two-phase/two-stage/validation sampling and estimation, along with limitations, practical considerations, and areas for future research.

[1]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[2]  Nathaniel Schenker Bridging across Changes in Classification Systems , 2005 .

[3]  J. Christman,et al.  Hypertension and type 2 diabetes comorbidity in adults in the United States: risk of overall and regional adiposity. , 2001, Obesity research.

[4]  A. Agresti An introduction to categorical data analysis , 1997 .

[5]  Margaret S. Pepe,et al.  Inference using surrogate outcome data and a validation sample , 1992 .

[6]  T. Raghunathan,et al.  Combining information from multiple surveys for assessing health disparities , 2006 .

[7]  N Schenker,et al.  Analyses of public use decennial census data with multiply imputed industry and occupation codes. , 1993, Journal of the Royal Statistical Society. Series C, Applied statistics.

[8]  K. Donato,et al.  Body mass index and the prevalence of hypertension and dyslipidemia. , 2000, Obesity research.

[9]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[10]  S. Heckbert,et al.  A validation study of patient interview data and pharmacy records for antihypertensive, statin, and antidepressant medication use among older women. , 2004, American journal of epidemiology.

[11]  D. Rubin,et al.  Multiple Imputation for Interval Estimation from Simple Random Samples with Ignorable Nonresponse , 1986 .

[12]  L. Hamilton SEX DIFFERENCES IN SELF‐REPORT ERRORS: A NOTE OF CAUTION , 1981 .

[13]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[14]  Nathaniel Schenker,et al.  Combining information from multiple surveys to enhance estimation of measures of health , 2007, Statistics in medicine.

[15]  R. Glynn,et al.  Characteristics Associated with Differences in Reported Versus Measured Total CholesterolAmong Male Physicians , 2004, Journal of Primary Prevention.

[16]  Norman R. Draper,et al.  Applied regression analysis (2. ed.) , 1981, Wiley series in probability and mathematical statistics.

[17]  C. Murray,et al.  Trends in National and State-Level Obesity in the USA after Correction for Self-Report Bias: Analysis of Health Surveys , 2006 .

[18]  M. Rowland,et al.  Self-reported weight and height. , 1990, The American journal of clinical nutrition.

[19]  D. Rubin,et al.  Multiple Imputation for Nonresponse in Surveys , 1989 .

[20]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[21]  Norman E. Breslow,et al.  Logistic regression for two-stage case-control data , 1988 .

[22]  Nathaniel Schenker,et al.  Bridging between Two Standards for Collecting Information on Race and Ethnicity: An Application to Census 2000 and Vital Rates , 2004, Public health reports.

[23]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[24]  Donald B. Rubin,et al.  Multiple Imputation of Industry and Occupation Codes in Census Public-use Samples Using Bayesian Logistic Regression , 1991 .

[25]  R. D'Agostino Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. , 2005, Statistics in medicine.

[26]  E. Arias,et al.  United States Census 2000 population with bridged race categories. , 2003, Vital and health statistics. Series 2, Data evaluation and methods research.

[27]  Xiao-Hua Zhou,et al.  Multiple imputation: review of theory, implementation and software , 2007, Statistics in medicine.

[28]  Sander Greenland,et al.  Multiple-imputation for measurement-error correction. , 2006, International journal of epidemiology.

[29]  Recail M Yucel,et al.  Imputation of Binary Treatment Variables With Measurement Error in Administrative Data , 2005 .

[30]  E. Gregg,et al.  Secular Trends in Cardiovascular Disease Risk Factors According to Body Mass Index in US Adults , 2005 .

[31]  Donald J. Treiman,et al.  Evaluating a Multiple-Imputation Method for Recalibrating 1970 U.S. Census Detailed Industry Codes to the 1980 Standard , 1988 .

[32]  R W Sanson-Fisher,et al.  The accuracy of self-reported health behaviors and risk factors relating to cancer and cardiovascular disease in the general population: a critical review. , 1999, American journal of preventive medicine.

[33]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[34]  Nathaniel Schenker,et al.  Assessing Variability Due to Race Bridging , 2003 .

[35]  R. D'Agostino Adjustment Methods: Propensity Score Methods for Bias Reduction in the Comparison of a Treatment to a Non‐Randomized Control Group , 2005 .

[36]  Jerome P. Reiter,et al.  The importance of modeling the sampling design in multiple imputation for missing data , 2006 .