Parametric Regression Analysis with Covariate Misclassification in Main Study/Validation Study Designs

Abstract Measurement error and misclassification have long been a concern in many fields, including medicine, administrative health care data, epidemiology, and survey sampling. It is known that measurement error and misclassification may seriously degrade the quality of estimation and inference, and should be avoided whenever possible. However, in practice, it is inevitable that measurements contain error for a variety of reasons. It is thus necessary to develop statistical strategies to cope with this issue. Although many inference methods have been proposed in the literature to address mis-measurement effects, some important issues remain unexplored. Typically, it is generally unclear how the available methods may perform relative to each other. In this paper, capitalizing on the unique feature of discrete variables, we consider settings with misclassified binary covariates and investigate issues concerning covariate misclassification; our development parallels available strategies for handling measurement error in continuous covariates. Under a unified framework, we examine a number of valid inferential procedures for practical settings where a validation study, either internal or external, is available besides a main study. Furthermore, we compare the relative performance of these methods and make practical recommendations.

[1]  J. Manson,et al.  Low-carbohydrate-diet score and the risk of coronary heart disease in women. , 2006, The New England journal of medicine.

[2]  Stephen Gruber,et al.  Accounting for error due to misclassification of exposures in case–control studies of gene–environment interaction , 2008, Statistics in medicine.

[3]  Grace Y. Yi,et al.  A Class of Functional Methods for Error-Contaminated Survival Data Under Additive Hazards Models with Replicate Measurements , 2016 .

[4]  Petter Laake,et al.  Correction for misclassification of a categorized exposure in binary regression using replication data , 2009, Statistics in medicine.

[5]  Lang Wu,et al.  Simultaneous inference and bias analysis for longitudinal data with covariate measurement error and missing responses. , 2011, Biometrics.

[6]  Donna Spiegelman,et al.  Measurement Error and Misclassification in Statistics and Epidemiology , 2006 .

[7]  Margaret S. Pepe,et al.  Expected estimating equations to accommodate covariate measurement error , 2000 .

[8]  Marie Davidian,et al.  A note on covariate measurement error in nonlinear mixed effects models , 1996 .

[9]  E. Rimm,et al.  Low-carbohydrate diet scores and risk of type 2 diabetes in men. , 2011, The American journal of clinical nutrition.

[10]  Petter Laake,et al.  On the Effect of Misclassification on Bias of Perfectly Measured Covariates in Regression , 2005, Biometrics.

[11]  R. Carroll,et al.  Efficient regression calibration for logistic regression in main study/internal validation study designs with an imperfect reference instrument. , 2001, Statistics in medicine.

[12]  Raymond J. Carroll,et al.  Conditional scores and optimal scores for generalized linear measurement-error models , 1987 .

[13]  Hua Liang,et al.  Partially linear single-index measurement error models , 2005 .

[14]  Jerald F. Lawless,et al.  Likelihood‐based and marginal inference methods for recurrent event data with covariate measurement error , 2012 .

[15]  Expected Estimating Equations for Missing Data, Measurement Error, and Misclassification, with Application to Longitudinal Nonignorable Missing Data , 2008, Biometrics.

[16]  D Spiegelman,et al.  Cost-efficient study designs for binary response data with Gaussian covariate measurement error. , 1991, Biometrics.

[17]  Jaime E Hart,et al.  The association of long-term exposure to PM2.5 on all-cause mortality in the Nurses’ Health Study and the impact of measurement-error correction , 2014, Environmental Health.

[18]  B Rosner,et al.  Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. , 2006, Statistics in medicine.

[19]  D. Spiegelman,et al.  Corrected score estimation in the proportional hazards model with misclassified discrete covariates , 2008, Statistics in medicine.

[20]  Ross L Prentice,et al.  Logistic Regression with Exposure Biomarkers and Flexible Measurement Error , 2007, Biometrics.

[21]  N. Kinukawa,et al.  A NOTE ON THE CORRECTED SCORE FUNCTION ADJUSTING FOR MISCLASSIFICATION , 1998 .

[22]  Li‐Pang Chen Statistical analysis with measurement error or misclassification: Strategy, method and application. Grace Y. Yi. New York: Springer‐Verlag. , 2019, Biometrics.

[23]  D. Ruppert,et al.  Measurement Error in Nonlinear Models , 1995 .

[24]  Grace Y. Yi,et al.  A NOTE ON MIS-SPECIFIED ESTIMATING FUNCTIONS , 2010 .

[25]  Roger Logan,et al.  Estimation and Inference for Logistic Regression with Covariate Misclassification and Measurement Error in Main Study/Validation Study Designs , 2000 .

[26]  David M. Zucker,et al.  A Pseudo–Partial Likelihood Method for Semiparametric Survival Regression With Covariate Errors , 2005 .

[27]  Donna Spiegelman,et al.  Inference for the Proportional Hazards Model with Misclassified Discrete‐Valued Covariates , 2004, Biometrics.

[28]  B Rosner,et al.  Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. , 1990, American journal of epidemiology.

[29]  G A Colditz,et al.  Reproducibility and validity of an expanded self-administered semiquantitative food frequency questionnaire among male health professionals. , 1992, American journal of epidemiology.

[30]  S Greenland,et al.  Statistical uncertainty due to misclassification: implications for validation substudies. , 1988, Journal of clinical epidemiology.

[31]  D Spiegelman,et al.  Correlated errors in biased surrogates: study designs and methods for measurement error correction , 2005, Statistics in medicine.

[32]  Raymond J Carroll,et al.  Functional and Structural Methods With Mixed Measurement Error and Misclassification in Covariates , 2015, Journal of the American Statistical Association.

[33]  Alexander Kukush,et al.  Measurement Error Models , 2011, International Encyclopedia of Statistical Science.

[34]  David Ruppert,et al.  Equivalence of regression calibration methods in main study/external validation study designs , 2003 .

[35]  Tsuyoshi Nakamura Corrected score function for errors-in-variables models : Methodology and application to generalized linear models , 1990 .

[36]  Grace Y. Yi,et al.  A functional generalized method of moments approach for longitudinal studies with missing responses and covariate measurement error , 2012, Biometrika.

[37]  B Rosner,et al.  Correction of logistic regression relative risk estimates and confidence intervals for random within-person measurement error. , 1992, American journal of epidemiology.

[38]  Yijian Huang,et al.  Consistent Functional Methods for Logistic Regression With Errors in Covariates , 2001 .

[39]  W. Willett,et al.  Calcium intake and risk of colon cancer in women and men. , 2002, Journal of the National Cancer Institute.

[40]  Francine Laden,et al.  Exposure measurement error in PM2.5 health effects studies: A pooled analysis of eight personal exposure validation studies , 2014, Environmental Health.

[41]  R. Carroll,et al.  Nonparametric Function Estimation for Clustered Data When the Predictor is Measured without/with Error , 2000 .

[42]  Grace Y Yi,et al.  A simulation-based marginal method for longitudinal data with dropout and mismeasured covariates. , 2008, Biostatistics.

[43]  J. R. Cook,et al.  Simulation-Extrapolation Estimation in Parametric Measurement Error Models , 1994 .

[44]  E. Rimm,et al.  Intake of fat, meat, and fiber in relation to risk of colon cancer in men. , 1994, Cancer research.