STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: Part 2—More complex methods of adjustment and advanced topics

We continue our review of issues related to measurement error and misclassification in epidemiology. We further describe methods of adjusting for biased estimation caused by measurement error in continuous covariates, covering likelihood methods, Bayesian methods, moment reconstruction, moment-adjusted imputation, and multiple imputation. We then describe which methods can also be used with misclassification of categorical covariates. Methods of adjusting estimation of distributions of continuous variables for measurement error are then reviewed. Illustrative examples are provided throughout these sections. We provide lists of available software for implementing these methods and also provide the code for implementing our examples in the Supporting Information. Next, we present several advanced topics, including data subject to both classical and Berkson error, modeling continuous exposures with measurement error, and categorical exposures with misclassification in the same model, variable selection when some of the variables are measured with error, adjusting analyses or design for error in an outcome variable, and categorizing continuous variables measured with error. Finally, we provide some advice for the often met situations where variables are known to be measured with substantial error, but there is only an external reference standard or partial (or no) information about the type or magnitude of the error.

[1]  Daniel O. Stram,et al.  The Errors-in-Variables Problem: Considerations Provided by Radiation Dose-Response Analyses of the A-Bomb Survivor Data , 1992 .

[2]  Guohua Zou,et al.  Adaptive LASSO for varying-coefficient partially linear measurement error models , 2013 .

[3]  K. Flegal,et al.  Differential misclassification arising from nondifferential errors in exposure measurement. , 1991, American journal of epidemiology.

[4]  Sander Greenland,et al.  Bias Analysis , 2011, International Encyclopedia of Statistical Science.

[5]  A. Olshan,et al.  Periconceptional vitamin use and leukemia risk in children with Down syndrome , 2005 .

[6]  Leonard A. Stefanski,et al.  Moment adjusted imputation for multivariate measurement error data with applications to logistic regression , 2013, Comput. Stat. Data Anal..

[7]  J. Ross,et al.  Periconceptional maternal vitamin supplementation and childhood leukaemia: an uncertainty analysis , 2008, Journal of Epidemiology & Community Health.

[8]  Runze Li,et al.  Variable Selection for Partially Linear Models With Measurement Errors , 2009, Journal of the American Statistical Association.

[9]  James R Carpenter,et al.  Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model , 2012, Statistical methods in medical research.

[10]  Tor D. Tosteson,et al.  Correcting for nonlinear measurement errors in the dependent variable in the general linear model , 1993 .

[11]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[12]  L. Xue,et al.  Variable selection for semiparametric errors-in-variables regression model with longitudinal data , 2014 .

[13]  A. Tsiatis,et al.  ON CLOSED FORM SEMIPARAMETRIC ESTIMATORS FOR MEASUREMENT ERROR MODELS , 2006 .

[14]  Raymond J. Carroll,et al.  Covariate Measurement Error in Logistic Regression , 1985 .

[15]  Elias Masry,et al.  Multivariate probability density deconvolution for stationary random processes , 1991, IEEE Trans. Inf. Theory.

[16]  Robert H Lyles,et al.  Sensitivity analysis for misclassification in logistic regression via likelihood methods and predictive value weighting , 2010, Statistics in medicine.

[17]  M. Blettner,et al.  Misclassification bias arising from random error in exposure measurement: implications for dual measurement strategies. , 1993, American journal of epidemiology.

[18]  D Spiegelman,et al.  Matrix Methods for Estimating Odds Ratios with Misclassified Exposure Data: Extensions and Comparisons , 1999, Biometrics.

[19]  S. Sinha,et al.  Semiparametric Bayesian analysis of censored linear regression with errors-in-covariates , 2017, Statistical methods in medical research.

[20]  F. Hamdy,et al.  Misclassification of outcome in case–control studies: Methods for sensitivity analysis , 2016, Statistical methods in medical research.

[21]  John P. Buonaccorsi,et al.  Measurement error in the response in the general linear model , 1996 .

[22]  J. Neuhaus Bias and efficiency loss due to misclassified responses in binary regression , 1999 .

[23]  Stephen R Cole,et al.  Accounting for misclassified outcomes in binary regression models using multiple imputation with internal validation data. , 2013, American journal of epidemiology.

[24]  Jonathan W. Bartlett,et al.  Multiple Imputation of Covariates by Substantive-model Compatible Fully Conditional Specification , 2015 .

[25]  R. Carroll,et al.  Is It Necessary to Correct for Measurement Error in Nutritional Epidemiology? , 2007, Annals of Internal Medicine.

[26]  A. Olshan,et al.  Bayesian Methods for Correcting Misclassification: An Example from Birth Defects Epidemiology , 2009, Epidemiology.

[27]  Paul H Garthwaite,et al.  A Simple Bayesian Analysis of Misclassified Binary Data with a Validation Substudy , 2002, Biometrics.

[28]  W O Johnson,et al.  Screening without a "gold standard": the Hui-Walter paradigm revisited. , 2001, American journal of epidemiology.

[29]  D. Midthune,et al.  Using intake biomarkers to evaluate the extent of dietary misreporting in a large sample of adults: the OPEN study. , 2003, American journal of epidemiology.

[30]  Anastasios A. Tsiatis,et al.  Locally efficient semiparametric estimators for functional measurement error models , 2004 .

[31]  Arnoldo Frigessi,et al.  Measurement error in Lasso: impact and likelihood bias correction , 2012, 1210.5378.

[32]  M. A. Tanner,et al.  Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions, 3rd Edition , 1998 .

[33]  Bin Wang,et al.  Deconvolution Estimation in Measurement Error Models: The R Package decon. , 2011, Journal of statistical software.

[34]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[35]  Raymond J Carroll,et al.  A NEW MULTIVARIATE MEASUREMENT ERROR MODEL WITH ZERO-INFLATED DIETARY DATA, AND ITS APPLICATION TO DIETARY ASSESSMENT. , 2011, The annals of applied statistics.

[36]  Alexander Meister,et al.  Density estimation with heteroscedastic error , 2008, 0805.2216.

[37]  Raymond J Carroll,et al.  A mixed‐effects model approach for estimating the distribution of usual intake of nutrients: The NCI method , 2010, Statistics in medicine.

[38]  Loki Natarajan,et al.  Maximum likelihood, multiple imputation and regression calibration for measurement error adjustment , 2008, Statistics in medicine.

[39]  Raymond J. Carroll,et al.  Measurement error in nonlinear models: a modern perspective , 2006 .

[40]  D. Midthune,et al.  A population's distribution of Healthy Eating Index-2005 component scores can be estimated when more than one 24-hour recall is available. , 2010, The Journal of nutrition.

[41]  Raymond J Carroll,et al.  Linear Model Selection When Covariates Contain Errors , 2017, Journal of the American Statistical Association.

[42]  Georg Heinze,et al.  Variable selection – A review and recommendations for the practicing statistician , 2018, Biometrical journal. Biometrische Zeitschrift.

[43]  Hui Zou,et al.  CoCoLasso for High-dimensional Error-in-variables Regression , 2015, 1510.07123.

[44]  Jonathan W Bartlett,et al.  Bayesian correction for covariate measurement error: A frequentist evaluation and comparison with regression calibration , 2016, Statistical methods in medical research.

[45]  R. Carroll,et al.  Impact of Uncertainties in Exposure Assessment on Estimates of Thyroid Cancer Risk among Ukrainian Children and Adolescents Exposed from the Chernobyl Accident , 2014, PloS one.

[46]  Andrew Gelman,et al.  R2WinBUGS: A Package for Running WinBUGS from R , 2005 .

[47]  A S Kosinski,et al.  Evaluating the exposure and disease relationship with adjustment for different types of exposure misclassification: a regression approach. , 1999, Statistics in medicine.

[48]  Victor Kipnis,et al.  Dealing with dietary measurement error in nutritional cohort studies. , 2011, Journal of the National Cancer Institute.

[49]  Marie Davidian,et al.  A Moment‐Adjusted Imputation Method for Measurement Error Models , 2011, Biometrics.

[50]  D. Ruppert,et al.  Density Estimation in the Presence of Heteroscedastic Measurement Error , 2008 .

[51]  I. White,et al.  Correcting for Bias due to Misclassification when Error-prone Continuous Exposures Are Misclassified , 2012 .

[52]  Xuming He,et al.  Quantile Regression Estimates for a Class of Linear and Partially Linear Errors-in-Variables Models , 1997 .

[53]  Raymond J Carroll,et al.  Modeling Data with Excess Zeros and Measurement Error: Application to Evaluating Relationships between Episodically Consumed Foods and Health Outcomes , 2009, Biometrics.

[54]  C. Geyer,et al.  Constrained Monte Carlo Maximum Likelihood for Dependent Data , 1992 .

[55]  Andrew Thomas,et al.  The BUGS project: Evolution, critique and future directions , 2009, Statistics in medicine.

[56]  J. Verkuilen Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach by P. de Boeck and M. Wilson and Generalized Latent Variable Modeling: Multilevel, Longitudinal and Structural Equation Models by A. Skrondal and S. Rabe-Hesketh , 2006, Psychometrika.

[57]  Wayne A. Fuller,et al.  Estimating Usual Dietary Intake Distributions: Adjusting for Measurement Error and Nonnormality in 24-Hour Food Intake Data , 1997 .

[58]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[59]  S Greenland,et al.  Basic methods for sensitivity analysis of biases. , 1996, International journal of epidemiology.

[60]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[61]  F. O. Hoffman,et al.  Radiation exposure and thyroid cancer. , 2006, JAMA.

[62]  Debdeep Pati,et al.  Bayesian Semiparametric Density Deconvolution in the Presence of Conditionally Heteroscedastic Measurement Errors , 2014, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[63]  Raymond J Carroll,et al.  Regression calibration is valid when properly applied. , 2013, Epidemiology.

[64]  Grace Y. Yi,et al.  Statistical Analysis with Measurement Error or Misclassification , 2017 .

[65]  C Frost,et al.  Correcting for measurement error in binary and continuous variables using replicates , 2001, Statistics in medicine.

[66]  R. Little,et al.  On Using Summary Statistics From an External Calibration Sample to Correct for Covariate Measurement Error , 2012, Epidemiology.

[67]  L. Magder,et al.  Logistic regression when the outcome is measured with uncertainty. , 1997, American journal of epidemiology.

[68]  W. Willett,et al.  Evaluation of the 24-Hour Recall as a Reference Instrument for Calibrating Other Self-Report Instruments in Nutritional Cohort Studies: Evidence From the Validation Studies Pooling Project , 2017, American journal of epidemiology.

[69]  Robert H Lyles,et al.  Validation Data-based Adjustments for Outcome Misclassification in Logistic Regression: An Illustration , 2011, Epidemiology.

[70]  G. Casella,et al.  Penalized regression, standard errors, and Bayesian lassos , 2010 .

[71]  M. Fox Creating a demand for bias analysis in epidemiological research , 2008, Journal of Epidemiology & Community Health.

[72]  Robert H Lyles,et al.  A Note on Estimating Crude Odds Ratios in Case–Control Studies with Differentially Misclassified Exposure , 2002, Biometrics.

[73]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[74]  Roger Logan,et al.  Estimation and Inference for Logistic Regression with Covariate Misclassification and Measurement Error in Main Study/Validation Study Designs , 2000 .

[75]  R. Carroll,et al.  Deconvolving kernel density estimators , 1987 .

[76]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[77]  A. Dekkers,et al.  SPADE, a new statistical program to estimate habitual dietary intake from multiple food sources and dietary supplements. , 2014, The Journal of nutrition.

[78]  John P. Buonaccorsi,et al.  Measurement errors, linear calibration and inferences for means , 1991 .

[79]  S Greenland,et al.  Risk factors for sudden infant death syndrome in the US Collaborative Perinatal Project. , 1989, International journal of epidemiology.

[80]  Richard F MacLehose,et al.  Good practices for quantitative bias analysis. , 2014, International journal of epidemiology.

[81]  Hu Yang,et al.  Variable Selection for Semiparametric Varying Coefficient Partially Linear Errors-in-Variables (EV) Model with Missing Response , 2015 .

[82]  Wenqing He,et al.  Accelerated failure time models with covariates subject to measurement error , 2007 .

[83]  Andrew W. Roddam,et al.  Measurement Error in Nonlinear Models: a Modern Perspective , 2008 .

[84]  R. Carroll,et al.  Bayesian semiparametric regression in the presence of conditionally heteroscedastic measurement and regression errors , 2014, Biometrics.

[85]  S Greenland,et al.  Variance estimation for epidemiologic effect estimates under misclassification. , 1988, Statistics in medicine.

[86]  Pamela Shaw,et al.  Regression calibration to correct correlated errors in outcome and exposure , 2018, Statistics in medicine.

[87]  P. Corey,et al.  Sources of variance in 24-hour dietary recall data: implications for nutrition study design and interpretation. , 1979, The American journal of clinical nutrition.

[88]  Raymond J Carroll,et al.  A comparison of regression calibration, moment reconstruction and imputation for adjusting for covariate measurement error in regression , 2008, Statistics in medicine.

[89]  Sophia Rabe-Hesketh,et al.  Maximum Likelihood Estimation of Generalized Linear Models with Covariate Measurement Error , 2003 .

[90]  L. Andersen,et al.  Estimating usual food intake distributions by using the multiple source method in the EPIC-Potsdam Calibration Study. , 2011, The Journal of nutrition.

[91]  R. Carroll,et al.  A new statistical method for estimating the usual intake of episodically consumed foods with application to their distribution. , 2006, Journal of the American Dietetic Association.

[92]  Timothy L Lash,et al.  A method to automate probabilistic sensitivity analyses of misclassified binary variables. , 2005, International journal of epidemiology.

[93]  L. Dodd,et al.  Using audit information to adjust parameter estimates for data errors in clinical trials , 2012, Clinical trials.

[94]  Petter Laake,et al.  Regression analysis with categorized regression calibrated exposure: some interesting findings , 2006, Emerging themes in epidemiology.

[95]  P. Levy Measurement Error and Misclassification in Statistics and Epidemiology: Impacts and Bayesian Adjustments , 2004 .

[96]  T. Lash,et al.  On the Need for Quantitative Bias Analysis in the Peer-Review Process , 2017, American journal of epidemiology.

[97]  David J. Lunn,et al.  The BUGS Book: A Practical Introduction to Bayesian Analysis , 2013 .

[98]  Petter Laake,et al.  Sensitivity of regression calibration to non‐perfect validation data with application to the Norwegian Women and Cancer Study , 2015, Statistics in medicine.

[99]  Raymond J Carroll,et al.  Statistical issues related to dietary intake as the response variable in intervention trials , 2016, Statistics in medicine.

[100]  F. O. Hoffman,et al.  THE HANFORD THYROID DISEASE STUDY: AN ALTERNATIVE VIEW OF THE FINDINGS , 2007, Health physics.

[101]  Jianqing Fan On the Optimal Rates of Convergence for Nonparametric Deconvolution Problems , 1991 .

[102]  V. Delpizzo,et al.  Exposure measurement errors, risk estimate and statistical power in case-control studies using dichotomous analysis of a continuous exposure variable. , 1995, International journal of epidemiology.

[103]  A. Gelman,et al.  Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box , 2011 .

[104]  Paul Gustafson,et al.  Bayesian adjustment for exposure misclassification in case–control studies , 2010, Statistics in medicine.

[105]  L. Corrado Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models , 2005 .

[106]  R. Carroll,et al.  Estimation of radiation risk in presence of classical additive and Berkson multiplicative errors in exposure doses. , 2016, Biostatistics.

[107]  Daniel O Stram,et al.  Power and Uncertainty Analysis of Epidemiological Studies of Radiation-Related Disease Risk in which Dose Estimates are Based on a Complex Dosimetry System: Some Observations , 2003, Radiation research.

[108]  Raymond J Carroll,et al.  A New Method for Dealing with Measurement Error in Explanatory Variables of Regression Models , 2004, Biometrics.

[109]  Aurore Delaigle,et al.  Nonparametric Kernel Methods with Errors‐in‐Variables: Constructing Estimators, Computing them, and Avoiding Common Mistakes , 2014 .

[110]  B Rosner,et al.  A Bayesian approach to logistic regression models having measurement error following a mixture distribution. , 1993, Statistics in medicine.

[111]  F. O. Hoffman,et al.  Shared Uncertainty in Measurement Error Problems, with Application to Nevada Test Site Fallout Data , 2007, Biometrics.

[112]  P. Gustafson,et al.  Comparing the Effects of Continuous and Discrete Covariate Mismeasurement, with Emphasis on the Dichotomization of Mismeasured Predictors , 2002, Biometrics.

[113]  R. Carroll,et al.  Methods for Estimation of Radiation Risk in Epidemiological Studies Accounting for Classical and Berkson Errors in Doses , 2011, The international journal of biostatistics.

[114]  Debdeep Pati,et al.  Bayesian Semiparametric Multivariate Density Deconvolution , 2014, Journal of the American Statistical Association.

[115]  Helmut Küchenhoff,et al.  Asymptotic variance estimation for the misclassification SIMEX , 2007, Comput. Stat. Data Anal..

[116]  Sander Greenland,et al.  Multiple-imputation for measurement-error correction. , 2006, International journal of epidemiology.

[117]  F. O. Hoffman,et al.  Semiparametric Regression Modeling with Mixtures of Berkson and Classical Error, with Application to Fallout from the Nevada Test Site , 2002, Biometrics.

[118]  I. White,et al.  A toolkit for measurement error correction, with a focus on nutritional epidemiology , 2014, Statistics in medicine.

[119]  L. Joseph,et al.  Bayesian estimation of disease prevalence and the parameters of diagnostic tests in the absence of a gold standard. , 1995, American journal of epidemiology.

[120]  Petter Laake,et al.  Correction for misclassification of a categorized exposure in binary regression using replication data , 2009, Statistics in medicine.

[121]  P. Dellaportas,et al.  BAYESIAN ANALYSIS OF ERRORS-IN-VARIABLES REGRESSION MODELS , 1995 .

[122]  Raymond J Carroll,et al.  Functional and Structural Methods With Mixed Measurement Error and Misclassification in Covariates , 2015, Journal of the American Statistical Association.

[123]  Maarten van Smeden,et al.  Measurement error is often neglected in medical literature: a systematic review. , 2018, Journal of clinical epidemiology.

[124]  Stan Lipovetsky,et al.  Generalized Latent Variable Modeling: Multilevel,Longitudinal, and Structural Equation Models , 2005, Technometrics.

[125]  Heejung Bang,et al.  Bias Correction Methods for Misclassified Covariates in the Cox Model: Comparison of Five Correction Methods by Simulation and Data Analysis , 2013, Journal of statistical theory and practice.

[126]  A. Carriquiry,et al.  A Semiparametric Transformation Approach to Estimating Usual Daily Intake Distributions , 1996 .

[127]  W. Sauerbrei,et al.  STRengthening Analytical Thinking for Observational Studies: the STRATOS initiative , 2014, Statistics in medicine.

[128]  R J Marshall,et al.  Validation study methods for estimating exposure proportions and odds ratios with misclassified data. , 1990, Journal of clinical epidemiology.

[129]  Emmanuel Lesaffre,et al.  A General Method for Dealing with Misclassification in Regression: The Misclassification SIMEX , 2006, Biometrics.

[130]  James E. Helmreich Regression Modeling Strategies with Applications to Linear Models, Logistic and Ordinal Regression and Survival Analysis (2nd Edition) , 2016 .

[131]  P. Hall,et al.  Optimal Rates of Convergence for Deconvolving a Density , 1988 .

[132]  Sander Greenland,et al.  Maximum-likelihood and closed-form estimators of epidemiologic measures under misclassification , 2008 .

[133]  B A Barron,et al.  The effects of misclassification on the estimation of relative risk. , 1977, Biometrics.

[134]  R. Carroll,et al.  Using biomarker data to adjust estimates of the distribution of usual intakes for misreporting: application to energy intake in the US population. , 2008, Journal of the American Dietetic Association.

[135]  Ruth H. Keogh,et al.  Epidemiologic analyses with error-prone exposures: review of current practice and recommendations. , 2018, Annals of epidemiology.

[136]  S C Darby,et al.  Some aspects of measurement error in explanatory variables for continuous and binary regression models. , 1998, Statistics in medicine.

[137]  John P. Buonaccorsi,et al.  Measurement Error: Models, Methods, and Applications , 2010 .

[138]  National Research Council,et al.  Nutrient Adequacy: Assessment Using Food Consumption Surveys. , 1987 .

[139]  P Gustafson,et al.  Case–Control Analysis with Partial Knowledge of Exposure Misclassification Probabilities , 2001, Biometrics.

[140]  M J Gibney,et al.  Introduction to the Monte Carlo project and the approach to the validation of probabilistic models of dietary exposure to selected food chemicals , 2003, Food additives and contaminants.

[141]  S. Richardson,et al.  Conditional independence models for epidemiological studies with covariate measurement error. , 1993, Statistics in medicine.

[142]  C. M. Gray,et al.  Use of the Bayesian family of methods to correct for effects of exposure measurement error in polynomial regression models , 2018 .

[143]  T. Ahern,et al.  Bias Analysis to Guide New Data Collection , 2012, The international journal of biostatistics.

[144]  Runze Li,et al.  Variable Selection in Measurement Error Models. , 2010, Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability.

[145]  H. van der Voet,et al.  Validation of Monte Carlo models for estimating presticide intake of Dutch infants , 2003 .

[146]  D. Midthune,et al.  Statistical methods for estimating usual intake of nutrients and foods: a review of the theory. , 2006, Journal of the American Dietetic Association.

[147]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.