Biases in electronic health record data due to processes within the healthcare system: retrospective observational study

Abstract Objective To evaluate on a large scale, across 272 common types of laboratory tests, the impact of healthcare processes on the predictive value of electronic health record (EHR) data. Design Retrospective observational study. Setting Two large hospitals in Boston, Massachusetts, with inpatient, emergency, and ambulatory care. Participants All 669 452 patients treated at the two hospitals over one year between 2005 and 2006. Main outcome measures The relative predictive accuracy of each laboratory test for three year survival, using the time of the day, day of the week, and ordering frequency of the test, compared to the value of the test result. Results The presence of a laboratory test order, regardless of any other information about the test result, has a significant association (P<0.001) with the odds of survival in 233 of 272 (86%) tests. Data about the timing of when laboratory tests were ordered were more accurate than the test results in predicting survival in 118 of 174 tests (68%). Conclusions Healthcare processes must be addressed and accounted for in analysis of observational health data. Without careful consideration to context, EHR data are unsuitable for many research questions. However, if explicitly modeled, the same processes that make EHR data complex can be leveraged to gain insight into patients’ state of health.

[1]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[2]  E. Gottfried,et al.  Diurnal change of blood count analytes in normal subjects. , 1996, American journal of clinical pathology.

[3]  J. Homer,et al.  System dynamics modeling for public health: background and opportunities. , 2006, American journal of public health.

[4]  Henry C. Chueh,et al.  Calculating the Benefits of a Research Patient Data Repository , 2006, AMIA.

[5]  R. Thomson,et al.  Falls in English and Welsh hospitals: a national observational study based on retrospective analysis of 12 months of patient safety incident reports , 2008, Quality & Safety in Health Care.

[6]  Peter J. Haug,et al.  Exploiting missing clinical data in Bayesian network modeling for predicting medical problems , 2008, J. Biomed. Informatics.

[7]  Douglas MacFadden,et al.  Application of Information Technology The Shared Health Research Information Network ( SHRINE ) : A Prototype Federated Query Tool for Clinical Data Repositories , 2014 .

[8]  Susan C. Weber,et al.  STRIDE - An Integrated Standards-Based Translational Research Informatics Platform , 2009, AMIA.

[9]  G. Hripcsak,et al.  A statistical dynamics approach to the study of human health data: resolving population scale diurnal variation in laboratory data. , 2010, Physics letters. A.

[10]  Griffin M. Weber,et al.  Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) , 2010, J. Am. Medical Informatics Assoc..

[11]  D. Blumenthal,et al.  Achieving a Nationwide Learning Health System , 2010, Science Translational Medicine.

[12]  I. Kohane Using electronic health records to drive discovery in disease genomics , 2011, Nature Reviews Genetics.

[13]  J. Fahrenkrug,et al.  Diurnal variation of hematology parameters in healthy young males: The Bispebjerg study of diurnal variations , 2011, Scandinavian journal of clinical and laboratory investigation.

[14]  George Hripcsak,et al.  Using time-delayed mutual information to discover and interpret temporal correlation structure in complex populations , 2011, Chaos.

[15]  T. Lasko,et al.  Computational Phenotype Discovery Using Unsupervised Feature Learning over Noisy, Sparse, and Irregular Clinical Data , 2013, PloS one.

[16]  G. Hripcsak,et al.  Correlating electronic health record concepts with healthcare process events , 2013, Journal of the American Medical Informatics Association : JAMIA.

[17]  Chunhua Weng,et al.  Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research , 2013, J. Am. Medical Informatics Assoc..

[18]  Douglas MacFadden,et al.  SHRINE: Enabling Nationally Scalable Multi-Site Disease Studies , 2013, PloS one.

[19]  D. Madigan,et al.  Evaluating the impact of database heterogeneity on observational study results. , 2013, American journal of epidemiology.

[20]  Advancing Research in the Era of Healthcare Reform , 2013, Clinical Medicine & Research.

[21]  I. Kohane,et al.  Extracting Physician Group Intelligence from Electronic Health Records to Support Evidence Based Medicine , 2013, PloS one.

[22]  Suzanne Bakken,et al.  Relationship between nursing documentation and patients' mortality. , 2013, American journal of critical care : an official publication, American Association of Critical-Care Nurses.

[23]  George Hripcsak,et al.  Next-generation phenotyping of electronic health records , 2012, J. Am. Medical Informatics Assoc..

[24]  Jeffrey G Klann,et al.  Query Health: standards-based, cross-platform population health surveillance , 2014, J. Am. Medical Informatics Assoc..

[25]  E. Tabak,et al.  Dynamical Phenotyping: Using Temporal Analysis of Clinically Collected Physiologic Data to Stratify Populations , 2014, PloS one.

[26]  Xiaobo Zhou,et al.  Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS): Architecture , 2014, J. Am. Medical Informatics Assoc..

[27]  A. M. Russell,et al.  The effects of financial incentives for case finding for depression in patients with diabetes and coronary heart disease: interrupted time series analysis , 2014, BMJ Open.

[28]  I. Kohane,et al.  Finding the missing link for big biomedical data. , 2014, JAMA.

[29]  Noémie Elhadad,et al.  Survival analysis with electronic health record data: Experiments with chronic kidney disease , 2014, Stat. Anal. Data Min..

[30]  Noémie Elhadad,et al.  Identifying and mitigating biases in EHR laboratory tests , 2014, J. Biomed. Informatics.

[31]  O. Johnson,et al.  Informatics and the clinical laboratory. , 2014, The Clinical biochemist. Reviews.

[32]  Carlo Ratti,et al.  Predictability Bounds of Electronic Health Records , 2015, Scientific Reports.

[33]  James Hodson,et al.  Temporal and other factors that influence the time doctors take to prescribe using an electronic prescribing system , 2015, J. Am. Medical Informatics Assoc..

[34]  Wendy Levinson,et al.  ‘Choosing Wisely’: a growing international campaign , 2014, BMJ quality & safety.

[35]  George Hripcsak,et al.  Parameterizing time in electronic health record studies , 2015, J. Am. Medical Informatics Assoc..

[36]  Y. Kitagawa,et al.  Prostate-specific antigen-based population screening for prostate cancer: current status in Japan and future perspective in Asia , 2014, Asian journal of andrology.

[37]  Benjamin H. Leaman,et al.  Prescribed opioids in primary care: cross-sectional and longitudinal analyses of influence of patient and practice characteristics , 2016, BMJ Open.

[38]  George Hripcsak,et al.  Comparing Lagged Linear Correlation, Lagged Regression, Granger Causality, and Vector Autoregression for Uncovering Associations in EHR Data , 2016, AMIA.

[39]  Carolyn Tarrant,et al.  The magnitude and mechanisms of the weekend effect in hospital admissions: A protocol for a mixed methods review incorporating a systematic review and framework synthesis , 2016, Systematic Reviews.

[40]  T. Farragher,et al.  Are differences in travel time or distance to healthcare for adults in global north countries associated with an impact on health outcomes? A systematic review , 2016, BMJ Open.

[41]  S. Wood Generalized Additive Models: An Introduction with R, Second Edition , 2017 .

[42]  Urine Casts , 2020, Definitions.