Practice of Epidemiology Controlling for Informed Presence Bias Due to the Number of Health Encounters in an Electronic Health Record

Electronic health records (EHRs) are an increasingly utilized resource for clinical research. While their size allows for many analytical opportunities, as with most observational data there is also the potential for bias. One of the key sources of bias in EHRs is what we term informed presence—the notion that inclusion in an EHR is not random but rather indicates that the subject is ill, making people in EHRs systematically different from those not in EHRs. In this article, we use simulated and empirical data to illustrate the conditions under which such bias can arise and how conditioning on the number of health-care encounters can be one way to remove this bias. In doing so, we also show when such an approach can impart M bias, or bias from conditioning on a collider. Finally, we explore the conditions under which number of medical encounters can serve as a proxy for general health. We apply these methods to an EHR data set from a university medical center covering the years 2007–2013.

[1]  Shelley A. Rusincovitch,et al.  Substance use and mental diagnoses among adults with and without type 2 diabetes: Results from electronic health records data. , 2015, Drug and alcohol dependence.

[2]  David Moher,et al.  The REporting of Studies Conducted Using Observational Routinely-Collected Health Data (RECORD) Statement: Methods for Arriving at Consensus and Developing Reporting Guidelines , 2015, PloS one.

[3]  Marie Lynn Miranda,et al.  Methods and initial findings from the Durham Diabetes Coalition: Integrating geospatial health technology and community interventions to reduce death and disability , 2015, Journal of clinical & translational endocrinology.

[4]  D. Westreich,et al.  Commentary: Berkson's fallacy and missing data. , 2014, International journal of epidemiology.

[5]  Shelley A. Rusincovitch,et al.  A comparison of phenotype definitions for diabetes mellitus. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[6]  Chunhua Weng,et al.  Sick Patients Have More Data: The Non-Random Completeness of Electronic Health Records , 2013, AMIA.

[7]  George Hripcsak,et al.  Caveats for the use of operational electronic health record data in comparative effectiveness research. , 2013, Medical care.

[8]  H. Quan,et al.  Assessing validity of ICD-9-CM and ICD-10 administrative data in recording clinical conditions in a unique dually coded database. , 2008, Health services research.

[9]  H. Quan,et al.  Coding Algorithms for Defining Comorbidities in ICD-9-CM and ICD-10 Administrative Data , 2005, Medical care.

[10]  S. Greenland Quantifying Biases in Causal Models: Classical Confounding vs Collider-Stratification Bias , 2003, Epidemiology.

[11]  J BERKSON,et al.  Limitations of the application of fourfold table analysis to hospital data. , 1946, Biometrics.

[12]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[13]  D. Westreich Berkson's bias, selection bias, and missing data. , 2012, Epidemiology.

[14]  J. Pearl,et al.  Causal diagrams for epidemiologic research. , 1999, Epidemiology.

[15]  C. Mackenzie,et al.  A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. , 1987, Journal of chronic diseases.

[16]  S. Schneeweiss,et al.  Practice of Epidemiology Implications of M Bias in Epidemiologic Studies: a Simulation Study , 2022 .