Bias Correction Methods for Misclassified Covariates in the Cox Model: Comparison of Five Correction Methods by Simulation and Data Analysis

Measurement error/misclassification is commonplace in research when variable(s) cannot be measured accurately. A number of statistical methods have been developed to tackle this problem in a variety of settings and contexts. However, relatively few methods are available to handle misclassified categorical exposure variable(s) in the Cox proportional hazards regression model. In this article, we aim to review and compare different methods to handle this problem—naive methods, regression calibration, pooled estimation, multiple imputation, corrected score estimation, and MC-SIMEX—by simulation. These methods are also applied to a life course study with recalled data and historical records. In practice, the issue of measurement error/misclassification should be accounted for in design and analysis, whenever possible. Also, in the analysis, it could be more ideal to implement more than one correction method for estimation and inference, with proper understanding of underlying assumptions.

[1]  Raymond J Carroll,et al.  A comparison of regression calibration, moment reconstruction and imputation for adjusting for covariate measurement error in regression , 2008, Statistics in medicine.

[2]  Lihong Qi,et al.  A comparison of multiple imputation and fully augmented weighted estimators for Cox regression with missing covariates , 2010, Statistics in medicine.

[3]  M. Hernán,et al.  The birth weight "paradox" uncovered? , 2006, American journal of epidemiology.

[4]  R. Carroll,et al.  Efficient regression calibration for logistic regression in main study/internal validation study designs with an imperfect reference instrument. , 2001, Statistics in medicine.

[5]  D. Richardson,et al.  Poisson regression analysis of ungrouped data , 2005, Occupational and Environmental Medicine.

[6]  Alexander Kukush,et al.  Measurement Error Models , 2011, International Encyclopedia of Statistical Science.

[7]  Ian R White,et al.  Commentary: dealing with measurement error: multiple imputation or regression calibration? , 2006, International journal of epidemiology.

[8]  B Rosner,et al.  Regression calibration method for correcting measurement-error bias in nutritional epidemiology. , 1997, The American journal of clinical nutrition.

[9]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[10]  Eric A Whitsel,et al.  International Journal of Health Geographics Historical Measures of Social Context in Life Course Studies: Retrospective Linkage of Addresses to Decennial Censuses , 2022 .

[11]  Ben Armstrong,et al.  Measurement error in the generalised linear model , 1985 .

[12]  J M Robins,et al.  Confounding and misclassification. , 1985, American journal of epidemiology.

[13]  Emmanuel Lesaffre,et al.  A General Method for Dealing with Misclassification in Regression: The Misclassification SIMEX , 2006, Biometrics.

[14]  R. Kronmal,et al.  A regression model for longitudinal change in the presence of measurement error. , 2002, Annals of epidemiology.

[15]  S Greenland,et al.  The effect of misclassification in the presence of covariates. , 1980, American journal of epidemiology.

[16]  D. Spiegelman,et al.  Corrected score estimation in the proportional hazards model with misclassified discrete covariates , 2008, Statistics in medicine.

[17]  Petter Laake,et al.  Regression analysis with categorized regression calibrated exposure: some interesting findings , 2006, Emerging themes in epidemiology.

[18]  J. Lynch,et al.  Childhood socioeconomic circumstances and cause-specific mortality in adulthood: systematic review and interpretation. , 2004, Epidemiologic reviews.

[19]  Dipankar Bandyopadhyay,et al.  An investigation of the MC‐SIMEX method with application to measurement error in periodontal outcomes , 2009, Statistics in medicine.

[20]  Yijian Huang,et al.  Cox Regression with Accurate Covariates Unascertainable: A Nonparametric-Correction Approach , 2000 .

[21]  Sander Greenland,et al.  Multiple-imputation for measurement-error correction. , 2006, International journal of epidemiology.

[22]  H. Lakka,et al.  Social disadvantages in childhood and risk of all-cause death and cardiovascular disease in later life: a comparison of historical and retrospective childhood information. , 2006, International journal of epidemiology.

[23]  Tsuyoshi Nakamura Corrected score function for errors-in-variables models : Methodology and application to generalized linear models , 1990 .

[24]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data , 1980 .

[25]  H. Bang Medical cost analysis: application to colorectal cancer data from the SEER Medicare database. , 2005, Contemporary clinical trials.

[26]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[27]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data: Kalbfleisch/The Statistical , 2002 .

[28]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[29]  T. Hakulinen,et al.  Mean and median survival times of cancer patients should be corrected for informative censoring. , 2009, Journal of clinical epidemiology.

[30]  Yijian Huang,et al.  Consistent Functional Methods for Logistic Regression With Errors in Covariates , 2001 .

[31]  H. Bang,et al.  Performance of automated and manual coding systems for occupational data: a case study of historical records. , 2012, American journal of industrial medicine.

[32]  N. Kinukawa,et al.  A NOTE ON THE CORRECTED SCORE FUNCTION ADJUSTING FOR MISCLASSIFICATION , 1998 .

[33]  James Lindsey,et al.  Fitting Parametric Counting Processes by using Log-linear Models , 1995 .

[34]  J. R. Cook,et al.  Simulation-Extrapolation Estimation in Parametric Measurement Error Models , 1994 .

[35]  Raymond J. Carroll,et al.  Measurement Error in Epidemiologic Studies , 2005 .

[36]  Helmut Küchenhoff,et al.  Asymptotic variance estimation for the misclassification SIMEX , 2007, Comput. Stat. Data Anal..

[37]  A. Folsom,et al.  The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The ARIC investigators. , 1989, American journal of epidemiology.

[38]  D. Ruppert,et al.  Measurement Error in Nonlinear Models , 1995 .

[39]  Raymond J. Carroll,et al.  Approximate Quasi-likelihood Estimation in Models with Surrogate Predictors , 1990 .

[40]  Loki Natarajan,et al.  Maximum likelihood, multiple imputation and regression calibration for measurement error adjustment , 2008, Statistics in medicine.

[41]  Yi Li,et al.  Survival Analysis with Error‐Prone Time‐Varying Covariates: A Risk Set Calibration Approach , 2011, Biometrics.

[42]  B Rosner,et al.  Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. , 2006, Statistics in medicine.

[43]  S. Greenland,et al.  Exposure-measurement error is frequently ignored when interpreting epidemiologic study results , 2007, European Journal of Epidemiology.

[44]  Jay S. Kaufman,et al.  Methods in social epidemiology , 2006 .

[45]  H. Bang,et al.  Historical records as a source of information for childhood socioeconomic status: results from a pilot study of decedents. , 2008, Annals of epidemiology.

[46]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[47]  Sander Greenland,et al.  Accounting for independent nondifferential misclassification does not increase certainty that an observed association is in the correct direction. , 2006, American journal of epidemiology.

[48]  R. Prentice Covariate measurement errors and parameter estimation in a failure time regression model , 1982 .

[49]  D. Rubin INFERENCE AND MISSING DATA , 1975 .