Illustrating Informed Presence Bias in Electronic Health Records Data: How Patient Interactions with a Health System Can Impact Inference

Electronic health record (EHR) data are becoming a primary resource for clinical research. Compared to traditional research data, such as those from clinical trials and epidemiologic cohorts, EHR data have a number of appealing characteristics. However, because they do not have mechanisms set in place to ensure that the appropriate data are collected, they also pose a number of analytic challenges. In this paper, we illustrate that how a patient interacts with a health system influences which data are recorded in the EHR. These interactions are typically informative, potentially resulting in bias. We term the overall set of induced biases informed presence. To illustrate this, we use examples from EHR based analyses. Specifically, we show that: 1) Where a patient receives services within a health facility can induce selection bias; 2) Which health system a patient chooses for an encounter can result in information bias; and 3) Referral encounters can create an admixture bias. While often times addressing these biases can be straightforward, it is important to understand how they are induced in any EHR based analysis.

[1]  Shelley A. Rusincovitch,et al.  Prevalence and Access of Secondary Source Medication Data: Evaluation of the Southeastern Diabetes Initiative (SEDI) , 2015, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[2]  Shuang Wang,et al.  Hidden in plain sight: bias towards sick patients when sampling patients with sufficient electronic health record data for research , 2014, BMC Medical Informatics and Decision Making.

[3]  Kenneth Pietz,et al.  Effect of Using Information From Only One System for Dually Eligible Health Care Users , 2006, Medical care.

[4]  C. Fischer Handbook of statistical genetics: , 2002, Human Genetics.

[5]  S. Pitts,et al.  Emergency department hypertension and regression to the mean. , 1998, Annals of emergency medicine.

[6]  F. Harrell,et al.  The evolution of medical and surgical therapy for coronary artery disease. A 15-year perspective. , 1989, JAMA.

[7]  Eugenia R. McPeek Hinz,et al.  Assessing electronic health record phenotypes against gold-standard diagnostic criteria for diabetes mellitus , 2017, J. Am. Medical Informatics Assoc..

[8]  James M Robins,et al.  Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available. , 2016, American journal of epidemiology.

[9]  Michael J Pencina,et al.  Controlling for Informed Presence Bias Due to the Number of Health Encounters in an Electronic Health Record. , 2016, American journal of epidemiology.

[10]  S. Haneuse,et al.  A General Framework for Considering Selection Bias in EHR-Based Studies: What Data Are Observed and Why? , 2016, EGEMS.

[11]  H. Quan,et al.  Coding Algorithms for Defining Comorbidities in ICD-9-CM and ICD-10 Administrative Data , 2005, Medical care.

[12]  B. Dean,et al.  Review: Use of Electronic Medical Records for Health Outcomes Research , 2009, Medical care research and review : MCRR.

[13]  R. Holman,et al.  Association of glycaemia with macrovascular and microvascular complications of type 2 diabetes (UKPDS 35): prospective observational study , 2000, BMJ : British Medical Journal.

[14]  George Hripcsak,et al.  Caveats for the use of operational electronic health record data in comparative effectiveness research. , 2013, Medical care.

[15]  Noémie Elhadad,et al.  Identifying and mitigating biases in EHR laboratory tests , 2014, J. Biomed. Informatics.

[16]  J. Greene,et al.  Assessing the Gold Standard--Lessons from the History of RCTs. , 2016, The New England journal of medicine.

[17]  Mary Jo Pugh,et al.  VHA Corporate Data Warehouse height and weight data: opportunities and challenges for health services research. , 2010, Journal of rehabilitation research and development.

[18]  Russ B. Altman,et al.  The utility of general purpose versus specialty clinical databases for research: Warfarin dose estimation from extracted clinical variables , 2010, J. Biomed. Informatics.

[19]  R. Reid,et al.  Using body mass index data in the electronic health record to calculate cardiovascular risk. , 2012, American journal of preventive medicine.

[20]  Peter J. Haug,et al.  Bias in Recording of Body Mass Index Data in the Electronic Health Record , 2013, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[21]  Harold S. Luft,et al.  Linking Electronic Health Records to Better Understand Breast Cancer Patient Pathways Within and Between Two Health Systems , 2015, EGEMS.

[22]  P. Rothwell,et al.  External validity of randomised controlled trials: “To whom do the results of this trial apply?” , 2005, The Lancet.

[23]  John P. A. Ioannidis,et al.  Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review , 2017, J. Am. Medical Informatics Assoc..

[24]  Peter J. Haug,et al.  Exploiting missing clinical data in Bayesian network modeling for predicting medical problems , 2008, J. Biomed. Informatics.

[25]  T. Lasko,et al.  Computational Phenotype Discovery Using Unsupervised Feature Learning over Noisy, Sparse, and Irregular Clinical Data , 2013, PloS one.

[26]  Susan E. Spratt,et al.  The agreement of patient-reported versus observed medication adherence in type 2 diabetes mellitus (T2DM) , 2016, BMJ Open Diabetes Research and Care.

[27]  L. Myers,et al.  Use of the emergency department for less-urgent care among type 2 diabetics under a disease management program , 2009, BMC health services research.

[28]  Shelley A. Rusincovitch,et al.  A comparison of phenotype definitions for diabetes mellitus. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[29]  Joseph E. Lucas,et al.  Development and validation of an electronic medical record (EMR)-based computed phenotype of HIV-1 infection , 2018, J. Am. Medical Informatics Assoc..

[30]  V. Hasselblad,et al.  Assessing the Clinical and Economic Burden of Coronary Artery Disease: 1986-1998 , 2001, Medical care.

[31]  B. Wells,et al.  Strategies for Handling Missing Data in Electronic Health Record Derived Data , 2013, EGEMS.

[32]  George Hripcsak,et al.  Exploiting time in electronic health record correlations , 2011, J. Am. Medical Informatics Assoc..

[33]  Marie Lynn Miranda,et al.  Methods and initial findings from the Durham Diabetes Coalition: Integrating geospatial health technology and community interventions to reduce death and disability , 2015, Journal of clinical & translational endocrinology.