How and when informative visit processes can bias inference when using electronic health records data for clinical research

OBJECTIVE Electronic health records (EHR) data have become a central data source for clinical research. One concern for using EHR data is that the process through which individuals engage with the health system, and find themselves within EHR data, can be informative. We have termed this process informed presence. In this study we use simulation and real data to assess how the informed presence can impact inference. MATERIALS AND METHODS We first simulated a visit process where a series of biomarkers were observed informatively and uninformatively over time. We further compared inference derived from a randomized control trial (ie, uninformative visits) and EHR data (ie, potentially informative visits). RESULTS We find that only when there is both a strong association between the biomarker and the outcome as well as the biomarker and the visit process is there bias. Moreover, once there are some uninformative visits this bias is mitigated. In the data example we find, that when the "true" associations are null, there is no observed bias. DISCUSSION These results suggest that an informative visit process can exaggerate an association but cannot induce one. Furthermore, careful study design can, mitigate the potential bias when some noninformative visits are included. CONCLUSIONS While there are legitimate concerns regarding biases that "messy" EHR data may induce, the conditions for such biases are extreme and can be accounted for.

[1]  D. Loomis,et al.  Varied Forms of Bias Due to Nondifferential Error in Measuring Exposure , 1994, Epidemiology.

[2]  Michael J Pencina,et al.  Controlling for Informed Presence Bias Due to the Number of Health Encounters in an Electronic Health Record. , 2016, American journal of epidemiology.

[3]  P. Grambsch,et al.  Modeling Survival Data: Extending the Cox Model , 2000 .

[4]  S. Haneuse,et al.  A General Framework for Considering Selection Bias in EHR-Based Studies: What Data Are Observed and Why? , 2016, EGEMS.

[5]  G Hripcsak,et al.  A Distribution-based Method for Assessing The Differences between Clinical Trial Target Populations and Patient Populations in Electronic Health Records , 2014, Applied Clinical Informatics.

[6]  Eleanor Pullenayegum,et al.  Longitudinal studies that use data collected as part of usual care risk reporting biased results: a systematic review , 2017, BMC Medical Research Methodology.

[7]  Chunhua Weng,et al.  Sick Patients Have More Data: The Non-Random Completeness of Electronic Health Records , 2013, AMIA.

[8]  R. Califf,et al.  Prevention of diabetes and cardiovascular disease in patients with impaired glucose tolerance: rationale and design of the Nateglinide And Valsartan in Impaired Glucose Tolerance Outcomes Research (NAVIGATOR) Trial. , 2008, American heart journal.

[9]  Benjamin A. Goldstein,et al.  Illustrating Informed Presence Bias in Electronic Health Records Data: How Patient Interactions with a Health System Can Impact Inference , 2017, EGEMS.

[10]  Marie Lynn Miranda,et al.  Geographic health information systems: a platform to support the 'triple aim'. , 2013, Health affairs.

[11]  K. Bailey,et al.  Estimation and comparison of changes in the presence of informative right censoring: conditional linear model. , 1989, Biometrics.

[12]  Charles E McCulloch,et al.  Analysis of longitudinal data from outcome‐dependent visit processes: Failure of proposed methods in realistic settings and potential improvements , 2018, Statistics in medicine.

[13]  M Dosemeci,et al.  Non-differential misclassification and bias towards the null: a clarification. , 1995, Occupational and environmental medicine.

[14]  Michael J Pencina,et al.  A comparison of risk prediction methods using repeated observations: an application to electronic health records for hemodialysis. , 2017, Statistics in medicine.

[15]  George Hripcsak,et al.  Caveats for the use of operational electronic health record data in comparative effectiveness research. , 2013, Medical care.

[16]  John M Neuhaus,et al.  Biased and unbiased estimation in longitudinal studies with informative visit processes , 2016, Biometrics.

[17]  B. Wells,et al.  Strategies for Handling Missing Data in Electronic Health Record Derived Data , 2013, EGEMS.

[18]  Miguel A. Hernán,et al.  Observation plans in longitudinal studies with time-varying treatments , 2009, Statistical methods in medical research.

[19]  Eugenia R. McPeek Hinz,et al.  Assessing electronic health record phenotypes against gold-standard diagnostic criteria for diabetes mellitus , 2017, J. Am. Medical Informatics Assoc..