STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: Part 1—Basic theory and simple methods of adjustment

Measurement error and misclassification of variables frequently occur in epidemiology and involve variables important to public health. Their presence can impact strongly on results of statistical analyses involving such variables. However, investigators commonly fail to pay attention to biases resulting from such mismeasurement. We provide, in two parts, an overview of the types of error that occur, their impacts on analytic results, and statistical methods to mitigate the biases that they cause. In this first part, we review different types of measurement error and misclassification, emphasizing the classical, linear, and Berkson models, and on the concepts of nondifferential and differential error. We describe the impacts of these types of error in covariates and in outcome variables on various analyses, including estimation and testing in regression models and estimating distributions. We outline types of ancillary studies required to provide information about such errors and discuss the implications of covariate measurement error for study design. Methods for ascertaining sample size requirements are outlined, both for ancillary studies designed to provide information about measurement error and for main studies where the exposure of interest is measured with error. We describe two of the simpler methods, regression calibration and simulation extrapolation (SIMEX), that adjust for bias in regression coefficients caused by measurement error in continuous covariates, and illustrate their use through examples drawn from the Observing Protein and Energy (OPEN) dietary validation study. Finally, we review software available for implementing these methods. The second part of the article deals with more advanced topics.

[1]  Ben Armstrong,et al.  Measurement error in the generalised linear model , 1985 .

[2]  Raymond J Carroll,et al.  Statistical issues related to dietary intake as the response variable in intervention trials , 2016, Statistics in medicine.

[3]  V. Kipnis,et al.  Estimating and testing interactions when explanatory variables are subject to non-classical measurement error , 2016, Statistical methods in medical research.

[4]  Victor Kipnis,et al.  Dealing with dietary measurement error in nutritional cohort studies. , 2011, Journal of the National Cancer Institute.

[5]  S L Zeger,et al.  Exposure measurement error in time-series studies of air pollution: concepts and consequences. , 2000, Environmental health perspectives.

[6]  C Frost,et al.  Correcting for measurement error in binary and continuous variables using replicates , 2001, Statistics in medicine.

[7]  F. Hu,et al.  Sodium and potassium intake and mortality among US adults: prospective data from the Third National Health and Nutrition Examination Survey. , 2011, Archives of internal medicine.

[8]  D. Altman,et al.  Measurement error. , 1996, BMJ.

[9]  D. Midthune,et al.  Using intake biomarkers to evaluate the extent of dietary misreporting in a large sample of adults: the OPEN study. , 2003, American journal of epidemiology.

[10]  L. Magder,et al.  Logistic regression when the outcome is measured with uncertainty. , 1997, American journal of epidemiology.

[11]  B G Armstrong,et al.  Effect of measurement error on epidemiological studies of environmental and occupational exposures. , 1998, Occupational and environmental medicine.

[12]  Helmut Küchenhoff,et al.  Asymptotic variance estimation for the misclassification SIMEX , 2007, Comput. Stat. Data Anal..

[13]  Donna Spiegelman Cost-efficient study designs for relative risk modeling with covariate measurement error , 1994 .

[14]  R. Carroll,et al.  Efficient regression calibration for logistic regression in main study/internal validation study designs with an imperfect reference instrument. , 2001, Statistics in medicine.

[15]  E Riboli,et al.  Validation and calibration of dietary intake measurements in the EPIC project: methodological considerations. European Prospective Investigation into Cancer and Nutrition. , 1997, International journal of epidemiology.

[16]  S. Thompson,et al.  Correcting for regression dilution bias: comparison of methods for a single predictor variable , 2000 .

[17]  R H Lyles,et al.  A detailed evaluation of adjustment methods for multiplicative measurement error in linear regression with applications in occupational epidemiology. , 1997, Biometrics.

[18]  W. Willett,et al.  Evaluation of the 24-Hour Recall as a Reference Instrument for Calibrating Other Self-Report Instruments in Nutritional Cohort Studies: Evidence From the Validation Studies Pooling Project , 2017, American journal of epidemiology.

[19]  Robert H Lyles,et al.  Validation Data-based Adjustments for Outcome Misclassification in Logistic Regression: An Illustration , 2011, Epidemiology.

[20]  Gregory J Welk,et al.  Modeling errors in physical activity recall data. , 2012, Journal of physical activity & health.

[21]  Han Hong,et al.  Nonlinear Models of Measurement Errors , 2011 .

[22]  M Dosemeci,et al.  Does nondifferential misclassification of exposure always bias a true effect toward the null value? , 1990, American journal of epidemiology.

[23]  Ian R White,et al.  Using surrogate biomarkers to improve measurement error models in nutritional epidemiology , 2013, Statistics in medicine.

[24]  D.Sc. Joseph Berkson Are there Two Regressions , 1950 .

[25]  R. Prentice,et al.  A risk set calibration method for failure time regression by using a covariate reliability sample , 2001 .

[26]  J. Xue,et al.  Personal exposure to airborne particles and metals: results from the Particle TEAM study in Riverside, California. , 1996, Journal of exposure analysis and environmental epidemiology.

[27]  I. White,et al.  A toolkit for measurement error correction, with a focus on nutritional epidemiology , 2014, Statistics in medicine.

[28]  Emmanuel Lesaffre,et al.  A General Method for Dealing with Misclassification in Regression: The Misclassification SIMEX , 2006, Biometrics.

[29]  D. H. Lees,et al.  Epidemiology for the Uninitiated , 1980 .

[30]  Wenqing He,et al.  Accelerated failure time models with covariates subject to measurement error , 2007 .

[31]  Andrew W. Roddam,et al.  Measurement Error in Nonlinear Models: a Modern Perspective , 2008 .

[32]  I. Bross Misclassification in 2 X 2 Tables , 1954 .

[33]  Raymond J. Carroll,et al.  Asymptotics for the SIMEX Estimator in Nonlinear Measurement Error Models , 1996 .

[34]  P W Lavori,et al.  Sample-size calculations for the Cox proportional hazards regression model with nonbinary covariates. , 2000, Controlled clinical trials.

[35]  P. Levy Measurement Error and Misclassification in Statistics and Epidemiology: Impacts and Bayesian Adjustments , 2004 .

[36]  Laurence S Freedman,et al.  Addressing Current Criticism Regarding the Value of Self-Report Dietary Data. , 2015, The Journal of nutrition.

[37]  Raquel Aparicio-Ugarriza,et al.  Physical Activity Patterns of the Spanish Population Are Mostly Determined by Sex and Age: Findings in the ANIBES Study , 2016, PloS one.

[38]  James W. Hardin,et al.  The Simulation Extrapolation Method for Fitting Generalized Linear Models with Additive Measurement Error , 2003 .

[39]  O. Devine,et al.  Estimating sample size for epidemiologic studies: the impact of ignoring exposure measurement uncertainty. , 1998, Statistics in medicine.

[40]  R Peto,et al.  Serum cholesterol concentration and coronary heart disease in population with low cholesterol concentrations. , 1991, BMJ.

[41]  Raymond J Carroll,et al.  A mixed‐effects model approach for estimating the distribution of usual intake of nutrients: The NCI method , 2010, Statistics in medicine.

[42]  Raymond J. Carroll,et al.  Measurement error in nonlinear models: a modern perspective , 2006 .

[43]  Timothy E. O'Brien,et al.  A Gentle Introduction to Optimal Design for Regression Models , 2003 .

[44]  M. Pepe,et al.  The effect of exposure variance and exposure measurement error on study sample size: implications for the design of epidemiologic studies. , 1994, Journal of clinical epidemiology.

[45]  Robert West,et al.  Outcome criteria in smoking cessation trials: proposal for a common standard. , 2005, Addiction.

[46]  R. Kohn,et al.  Screening Yield of HIV Antigen/Antibody Combination and Pooled HIV RNA Testing for Acute HIV Infection in a High-Prevalence Population. , 2016, JAMA.

[47]  D A Schoeller,et al.  Measurement of energy expenditure in free-living humans by using doubly labeled water. , 1988, The Journal of nutrition.

[48]  R. Collins,et al.  Blood pressure, stroke, and coronary heart disease Part 1, prolonged differences in blood pressure: prospective observational studies corrected for the regression dilution bias , 1990, The Lancet.

[49]  D. Schoeller,et al.  Dietary biomarker evaluation in a controlled feeding study in women from the Women's Health Initiative cohort. , 2017, The American journal of clinical nutrition.

[50]  Ruth H. Keogh,et al.  Epidemiologic analyses with error-prone exposures: review of current practice and recommendations. , 2018, Annals of epidemiology.

[51]  J. R. Cook,et al.  Simulation-Extrapolation Estimation in Parametric Measurement Error Models , 1994 .

[52]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[53]  A. Peters,et al.  Mixtures of Berkson and classical covariate measurement error in the linear mixed model: Bias analysis and application to a study on ultrafine particles , 2018, Biometrical journal. Biometrische Zeitschrift.

[54]  R. Prentice,et al.  Hazard Ratio Estimation for Biomarker‐Calibrated Dietary Exposures , 2012, Biometrics.

[55]  E Riboli,et al.  Calibration of dietary intake measurements in prospective cohort studies. , 1995, American journal of epidemiology.

[56]  S. Wacholder,et al.  Blind assignment of exposure does not always prevent differential misclassification. , 1991, American journal of epidemiology.

[57]  P. Corey,et al.  Sources of variance in 24-hour dietary recall data: implications for nutrition study design and interpretation. , 1979, The American journal of clinical nutrition.

[58]  Antonio Ciampi,et al.  Uses and limitations of statistical accounting for random error correlations, in the validation of dietary questionnaire assessments , 2002, Public Health Nutrition.

[59]  J. Buzas,et al.  Power and sample size calculations for generalized regression models with covariate measurement error , 2003, Statistics in medicine.

[60]  S Greenland,et al.  When will nondifferential misclassification of an exposure preserve the direction of a trend? , 1994, American journal of epidemiology.

[61]  Yi Li,et al.  Survival Analysis with Error‐Prone Time‐Varying Covariates: A Risk Set Calibration Approach , 2011, Biometrics.

[62]  Alice S. Whittemore,et al.  Errors-in-Variables Regression Using Stein Estimates , 1989 .

[63]  D. Kromhout,et al.  Correcting for multivariate measurement error by regression calibration in meta‐analyses of epidemiological studies , 2009, Statistics in medicine.

[64]  Raymond J Carroll,et al.  Using regression calibration equations that combine self-reported intake and biomarker measures to obtain unbiased estimates and more powerful tests of dietary associations. , 2011, American journal of epidemiology.

[65]  K. Flegal,et al.  Differential misclassification arising from nondifferential errors in exposure measurement. , 1991, American journal of epidemiology.

[66]  B Rosner,et al.  Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. , 1990, American journal of epidemiology.

[67]  R. Prentice,et al.  Biomarker-calibrated Energy and Protein Consumption and Cardiovascular Disease Risk Among Postmenopausal Women , 2011, Epidemiology.

[68]  Raymond J. Carroll,et al.  Approximate Quasi-likelihood Estimation in Models with Surrogate Predictors , 1990 .

[69]  W. G. Cochran Errors of Measurement in Statistics , 1968 .

[70]  Katherine F. Bartley,et al.  Measurement error of self-reported physical activity levels in New York City: assessment and correction. , 2015, American journal of epidemiology.

[71]  Paul Gustafson,et al.  Partial Identification arising from Nondifferential Exposure Misclassification: How Informative are Data on the Unlikely, Maybe, and Likely Exposed? , 2012, The international journal of biostatistics.

[72]  Thomas Lumley,et al.  Considerations for analysis of time‐to‐event outcomes measured with error: Bias and correction with SIMEX , 2018, Statistics in medicine.

[73]  J. R. Cook,et al.  Simulation-Extrapolation: The Measurement Error Jackknife , 1995 .

[74]  D. Spiegelman,et al.  Regression Calibration with Heteroscedastic Error Variance , 2011, The international journal of biostatistics.

[75]  L. Waller,et al.  Impact of exposure measurement error in air pollution epidemiology: effect of error type in time-series studies , 2011, Environmental health : a global access science source.

[76]  N. Cook,et al.  Joint effects of sodium and potassium intake on subsequent cardiovascular disease: the Trials of Hypertension Prevention follow-up study. , 2009, Archives of internal medicine.

[77]  B Rosner,et al.  Correction of logistic regression relative risk estimates and confidence intervals for random within-person measurement error. , 1992, American journal of epidemiology.

[78]  R J Carroll,et al.  Estimating the relation between dietary intake obtained from a food frequency questionnaire and true average intake. , 1991, American journal of epidemiology.

[79]  Francine Laden,et al.  Exposure measurement error in PM2.5 health effects studies: A pooled analysis of eight personal exposure validation studies , 2014, Environmental Health.

[80]  N. Day,et al.  Measurement of fruit and vegetable consumption with diet questionnaires and implications for analyses and interpretation. , 2005, American journal of epidemiology.

[81]  H Kromhout,et al.  Individual-based and group-based occupational exposure assessment: some equations to evaluate different strategies. , 1998, The Annals of occupational hygiene.

[82]  R. Carroll,et al.  Is It Necessary to Correct for Measurement Error in Nutritional Epidemiology? , 2007, Annals of Internal Medicine.

[83]  Charles E Matthews,et al.  The role of measurement error in estimating levels of physical activity. , 2007, American journal of epidemiology.

[84]  M Reilly,et al.  Optimal sampling strategies for two-stage studies. , 1996, American journal of epidemiology.

[85]  W. Zareba,et al.  Elevated particle number concentrations induce immediate changes in heart rate variability: a panel study in individuals with impaired glucose metabolism or diabetes , 2015, Particle and Fibre Toxicology.

[86]  Chris Frost,et al.  Linear mixed models for replication data to efficiently allow for covariate measurement error , 2009, Statistics in medicine.

[87]  Thomas M. Peters,et al.  New Methods for Personal Exposure Monitoring for Airborne Particles , 2015, Current Environmental Health Reports.

[88]  Alexander Kukush,et al.  Measurement Error Models , 2011, International Encyclopedia of Statistical Science.

[89]  G. Imbens,et al.  Bias From Classical and Other Forms of Measurement Error , 2000 .

[90]  Maarten van Smeden,et al.  Measurement error is often neglected in medical literature: a systematic review. , 2018, Journal of clinical epidemiology.

[91]  Raymond J Carroll,et al.  Measurement of Active and Sedentary Behavior in Context of Large Epidemiologic Studies , 2017, Medicine and science in sports and exercise.

[92]  S W Lagakos,et al.  Effects of mismodelling and mismeasuring explanatory variables on tests of their association with a response variable. , 1988, Statistics in medicine.

[93]  C. Matthews,et al.  Calibration of activity-related energy expenditure in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) , 2018, Journal of science and medicine in sport.

[94]  Helsinki Finland Review of surveys for risk factors of major chronic diseases and comparability of the results , 2002 .

[95]  B Rosner,et al.  Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. , 2006, Statistics in medicine.

[96]  L. Freedman,et al.  Estimating and testing interactions in linear regression models when explanatory variables are subject to classical measurement error , 2007, Statistics in medicine.

[97]  John P. Buonaccorsi,et al.  Measurement Error: Models, Methods, and Applications , 2010 .

[98]  Pat McInturff,et al.  Modelling risk when binary outcomes are subject to error , 2004, Statistics in medicine.

[99]  M. Rebagliato Validation of self reported smoking , 2002, Journal of epidemiology and community health.

[100]  J. Connett,et al.  Error in smoking measures: effects of intervention on relations of cotinine and carbon monoxide to self-reported smoking. The Lung Health Study Research Group. , 1993, American journal of public health.

[101]  James W. Hardin,et al.  The Regression Calibration Method for Fitting Generalized Linear Models with Additive Measurement Error , 2003 .

[102]  D Spiegelman,et al.  Measurement error correction for logistic regression models with an "alloyed gold standard". , 1997, American journal of epidemiology.

[103]  Grace Y. Yi,et al.  Statistical Analysis with Measurement Error or Misclassification , 2017 .

[104]  R. Tibshirani,et al.  Implications of measurement error in exposure for the sample sizes of case-control studies. , 1994, American journal of epidemiology.

[105]  Raymond J Carroll,et al.  Structure of dietary measurement error: results of the OPEN biomarker study. , 2003, American journal of epidemiology.

[106]  Raymond J Carroll,et al.  Modeling Data with Excess Zeros and Measurement Error: Application to Evaluating Relationships between Episodically Consumed Foods and Health Outcomes , 2009, Biometrics.

[107]  R. Prentice,et al.  Physical activity assessment: biomarkers and self-report of activity-related energy expenditure in the WHI. , 2013, American journal of epidemiology.

[108]  P. Gustafson,et al.  On the Impact of Misclassification in an Ordinal Exposure Variable , 2014 .

[109]  Steven G. Self,et al.  Power/Sample Size Calculations for Generalized Linear Models , 1988 .

[110]  R. Carroll,et al.  A measurement error model for physical activity level as measured by a questionnaire with application to the 1999-2006 NHANES questionnaire. , 2013, American journal of epidemiology.

[111]  C Y Wang,et al.  Research strategies and the use of nutrient biomarkers in studies of diet and chronic disease , 2002, Public Health Nutrition.

[112]  Mustafa Dosemeci,et al.  RE: “DOES NONDIFFERENTIAL MISCLASSIFICATION OF EXPOSURE ALWAYS BIAS A TRUE EFFECT TOWARD THE NULL VALUE?” , 1991 .

[113]  R. Prentice Covariate measurement errors and parameter estimation in a failure time regression model , 1982 .

[114]  R. Burkhauser,et al.  Beyond BMI: The Value of More Accurate Measures of Fatness and Obesity in Social Science Research , 2006, Journal of health economics.

[115]  T R Holford,et al.  Study design for epidemiologic studies with measurement error , 1995, Statistical methods in medical research.

[116]  H. Brenner Notes on the Assessment of Trend in the Presence of Nondifferential Exposure Misclassification , 1992, Epidemiology.

[117]  M. Singer,et al.  Nutritional Epidemiology , 2020, Definitions.

[118]  Raymond J Carroll,et al.  Spatial measurement error and correction by spatial SIMEX in linear regression models when using predicted air pollution exposures. , 2016, Biostatistics.