Adjusting for differential misclassification in matched case‐control studies utilizing health administrative data

In epidemiological studies of secondary data sources, lack of accurate disease classifications often requires investigators to rely on diagnostic codes generated by physicians or hospital systems to identify case and control groups, resulting in a less-than-perfect assessment of the disease under investigation. Moreover, because of differences in coding practices by physicians, it is hard to determine the factors that affect the chance of an incorrectly assigned disease status. What results is a dilemma where assumptions of non-differential misclassification are questionable but, at the same time, necessary to proceed with statistical analyses. This paper develops an approach to adjust exposure-disease association estimates for disease misclassification, without the need of simplifying non-differentiality assumptions, or prior information about a complicated classification mechanism. We propose to leverage rich temporal information on disease-specific healthcare utilization to estimate each participant's probability of being a true case and to use these estimates as weights in a Bayesian analysis of matched case-control data. The approach is applied to data from a recent observational study into the early symptoms of multiple sclerosis (MS), where MS cases were identified from Canadian health administrative databases and matched to population controls that are assumed to be correctly classified. A comparison of our results with those from non-differentially adjusted analyses reveals conflicting inferences and highlights that ill-suited assumptions of non-differential misclassification can exacerbate biases in association estimates.

[1]  Juxin Liu,et al.  Bayesian analysis of a matched case–control study with expert prior information on both the misclassification of exposure and the exposure–disease association , 2009, Statistics in medicine.

[2]  Paul Gustafson,et al.  Bayesian analysis of pair‐matched case‐control studies subject to outcome misclassification , 2017, Statistics in medicine.

[3]  S Greenland,et al.  The effect of misclassification in matched-pair case-control studies. , 1982, American journal of epidemiology.

[4]  H Checkoway,et al.  Bias due to misclassification in the estimation of relative risk. , 1977, American journal of epidemiology.

[5]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[6]  Carl van Walraven,et al.  A comparison of methods to correct for misclassification bias from administrative database diagnostic codes. , 2018, International journal of epidemiology.

[7]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[8]  Kathryn Roeder,et al.  A Bayesian semiparametric model for case-control studies with errors in variables , 1997 .

[9]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[10]  Ruth Ann Marrie,et al.  Five years before multiple sclerosis onset: Phenotyping the prodrome , 2018, Multiple sclerosis.

[11]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[12]  S. Duffy,et al.  Correcting for the effect of misclassification bias in a case-control study using data from two different questionnaires. , 1983, Biometrics.

[13]  Nancy Xiaonan Yu,et al.  The rising prevalence and changing age distribution of multiple sclerosis in Manitoba , 2010, Neurology.

[14]  John F. Hurdle,et al.  Measuring diagnoses: ICD code accuracy. , 2005, Health services research.

[15]  Gayle Halas,et al.  Describing the content of primary care: limitations of Canadian billing data , 2012, BMC Family Practice.

[16]  S. Greenland,et al.  How far from non-differential does exposure or disease misclassification have to be to bias measures of association away from the null? , 2008, International journal of epidemiology.

[17]  Paul H Garthwaite,et al.  Bayesian analysis of misclassified binary data from a matched case–control study with a validation sub‐study , 2005, Statistics in medicine.

[18]  Helen Tremlett,et al.  Health-care use before a first demyelinating event suggestive of a multiple sclerosis prodrome: a matched cohort study , 2017, The Lancet Neurology.

[19]  Stephen W Duffy,et al.  Misclassification in a matched case-control study with variable matching ratio: application to a study of c-erbB-2 overexpression and breast cancer. , 2003, Statistics in medicine.

[20]  Christina Wolfson,et al.  The Incidence and Prevalence of Multiple Sclerosis in Nova Scotia, Canada , 2013, Canadian Journal of Neurological Sciences / Journal Canadien des Sciences Neurologiques.

[21]  Sander Greenland,et al.  Adjusting for outcome misclassification: the importance of accounting for case-control sampling and other forms of outcome-related selection. , 2013, Annals of epidemiology.

[22]  Kenneth Rice Full-likelihood approaches to misclassification of a binary exposure in matched case-control studies. , 2003, Statistics in medicine.

[23]  F. Leisch,et al.  Finite Mixtures of Generalized Linear Regression Models , 2008 .

[24]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[25]  Christina Wolfson,et al.  High incidence and increasing prevalence of multiple sclerosis in British Columbia, Canada: findings from over two decades (1991–2010) , 2015, Journal of Neurology.

[26]  Simon G Thompson,et al.  Flexible parametric models for random‐effects distributions , 2008, Statistics in medicine.