DQe-v: A Database-Agnostic Framework for Exploring Variability in Electronic Health Record Data Across Time and Site Location

Data variability is a commonly observed phenomenon in Electronic Health Records (EHR) data networks. A common question asked in scientific investigations of EHR data is whether the cross-site and -time variability reflects an underlying data quality error at one or more contributing sites versus actual differences driven by various idiosyncrasies in the healthcare settings. Although research analysts and data scientists have commonly used various statistical methods to detect and account for variability in analytic datasets, self service tools to facilitate exploring cross-organizational variability in EHR data warehouses are lacking and could benefit from meaningful data visualizations. DQe-v, an interactive, database-agnostic tool for visually exploring variability in EHR data provides such a solution. DQe-v is built on an open source platform, R statistical software, with annotated scripts and a readme document that makes it fully reproducible. To illustrate and describe functionality of DQe-v, we describe the DQe-v’s readme document which includes a complete guide to installation, running the program, and interpretation of the outputs. We also provide annotated R scripts and an example dataset as supplemental materials. DQe-v offers a self service tool to visually explore data variability within EHR datasets irrespective of the data model. GitHub and CIELO offer hosting and distribution of the tool and can facilitate collaboration across any interested community of users as we target improving usability, efficiency, and interoperability.

[1]  A. Walker Matching on provider is risky. , 2013, Journal of clinical epidemiology.

[2]  W. Hersh Adding value to the electronic health record through secondary use of data for quality assurance, research, and surveillance. , 2007, The American journal of managed care.

[3]  M. Buntin,et al.  Variation in Electronic Health Record Adoption and Readiness for Meaningful Use: 2008–2011 , 2013, Journal of General Internal Medicine.

[4]  Charles E. Leonard,et al.  Quality of Medicaid and Medicare Data Obtained Through Centers for Medicare and Medicaid Services (CMS) , 2007, Medical care.

[5]  Kari A. Stephens,et al.  Visualizing Anomalies in Electronic Health Record Data: The Variability Explorer Tool , 2015, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[6]  Kitty S. Chan,et al.  Review: Electronic Health Records and the Reliability and Validity of Quality Measures: A Review of the Literature , 2010, Medical care research and review : MCRR.

[7]  Bernard Rachet,et al.  Control of data quality for population-based cancer survival analysis. , 2014, Cancer epidemiology.

[8]  Michael M. Wagner,et al.  Review: Accuracy of Data in Computer-based Patient Records , 1997, J. Am. Medical Informatics Assoc..

[9]  Ping Yu,et al.  A Review of Data Quality Assessment Methods for Public Health Information Systems , 2014, International journal of environmental research and public health.

[10]  Patrick B. Ryan,et al.  Managing Data Quality for a Drug Safety Surveillance System , 2013, Drug Safety.

[11]  T. Murdoch,et al.  The inevitable application of big data to health care. , 2013, JAMA.

[12]  Chunhua Weng,et al.  Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research , 2013, J. Am. Medical Informatics Assoc..

[13]  Charles Bae,et al.  The Challenges of Data Quality Evaluation in an EHR-based Data Registry , 2014 .

[14]  E. McGlynn,et al.  The Challenge of Measuring Quality of Care From the Electronic Health Record , 2009, American journal of medical quality : the official journal of the American College of Medical Quality.

[15]  Patrick B. Ryan,et al.  Transparent Reporting of Data Quality in Distributed Data Networks , 2015, EGEMS.

[16]  Pradeep Kumar Ray,et al.  Towards an ontology for data quality in integrated chronic disease management: A realist review of the literature , 2013, Int. J. Medical Informatics.

[17]  Lisa M. Schilling,et al.  The DARTNet Institute: Seeking a Sustainable Support Mechanism for Electronic Data Enabled Research Networks , 2014, EGEMS.

[18]  Kari A. Stephens,et al.  Implementing partnership-driven clinical federated electronic health record data sharing networks , 2016, Int. J. Medical Informatics.

[19]  Peter Croft,et al.  Measuring disease prevalence: a comparison of musculoskeletal disease using four general practice consultation databases. , 2007, The British journal of general practice : the journal of the Royal College of General Practitioners.

[20]  R. Kaushal,et al.  Physician Specialty and Variations in Adoption of Electronic Health Records , 2013, Applied Clinical Informatics.

[21]  M. Kahn,et al.  Data Quality Assessment for Comparative Effectiveness Research in Distributed Data Networks , 2013, Medical care.

[22]  John Urchek,et al.  The Challenges of Data Quality Evaluation in a Joint Data Warehouse , 2015, EGEMS.

[23]  Behavioral Domains,et al.  Capturing Social and Behavioral Domains and Measures in Electronic Health Records: Phase 2 , 2015 .

[24]  Tomasz Burzykowski,et al.  A statistical approach to central monitoring of data quality in clinical trials , 2012, Clinical trials.

[25]  M. Marino,et al.  Variation in Outcomes of Quality Measurement by Data Source , 2014, Pediatrics.

[26]  T. Niyonsenga,et al.  Spatial variation in the management and outcomes of acute coronary syndrome , 2005, BMC cardiovascular disorders.

[27]  Alex J. Sutton,et al.  Heterogeneity: Subgroups, Meta-Regression, Bias And Bias-Adjustment , 2011 .

[28]  P. Small,et al.  Methods to identify standard data elements in clinical and public health forms. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[29]  George Hripcsak,et al.  Defining and measuring completeness of electronic health records for secondary use , 2013, J. Biomed. Informatics.

[30]  Nicolette de Keizer,et al.  Model Formulation: Defining and Improving Data Quality in Medical Registries: A Literature Review, Case Study, and Generic Framework , 2002, J. Am. Medical Informatics Assoc..

[31]  Matthew R Cooperberg,et al.  Time trends and local variation in primary treatment of localized prostate cancer. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[32]  J. Steiner,et al.  A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research. , 2012, Medical care.

[33]  Steven G. Johnson,et al.  A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data , 2016, EGEMS.