Exploring completeness in clinical data research networks with DQe-c

Abstract Objective To provide an open source, interoperable, and scalable data quality assessment tool for evaluation and visualization of completeness and conformance in electronic health record (EHR) data repositories. Materials and Methods This article describes the tool’s design and architecture and gives an overview of its outputs using a sample dataset of 200 000 randomly selected patient records with an encounter since January 1, 2010, extracted from the Research Patient Data Registry (RPDR) at Partners HealthCare. All the code and instructions to run the tool and interpret its results are provided in the Supplementary Appendix. Results DQe-c produces a web-based report that summarizes data completeness and conformance in a given EHR data repository through descriptive graphics and tables. Results from running the tool on the sample RPDR data are organized into 4 sections: load and test details, completeness test, data model conformance test, and test of missingness in key clinical indicators. Discussion Open science, interoperability across major clinical informatics platforms, and scalability to large databases are key design considerations for DQe-c. Iterative implementation of the tool across different institutions directed us to improve the scalability and interoperability of the tool and find ways to facilitate local setup. Conclusion EHR data quality assessment has been hampered by implementation of ad hoc processes. The architecture and implementation of DQe-c offer valuable insights for developing reproducible and scalable data science tools to assess, manage, and process data in clinical data repositories.

[1]  Philip E. Bourne,et al.  The NIH Big Data to Knowledge (BD2K) initiative , 2015, J. Am. Medical Informatics Assoc..

[2]  E. Hing,et al.  Electronic health record systems and intent to apply for meaningful use incentives among office-based physician practices: United States, 2001-2011. , 2011, NCHS data brief.

[3]  Xiaobo Zhou,et al.  Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS): Architecture , 2014, J. Am. Medical Informatics Assoc..

[4]  Bernard Rachet,et al.  Control of data quality for population-based cancer survival analysis. , 2014, Cancer epidemiology.

[5]  Michael M. Wagner,et al.  Review: Accuracy of Data in Computer-based Patient Records , 1997, J. Am. Medical Informatics Assoc..

[6]  Tony Norris,et al.  The strategic management of data quality in healthcare , 2008, Health Informatics J..

[7]  Arthur W. Toga,et al.  Big biomedical data as the key resource for discovery science , 2015, J. Am. Medical Informatics Assoc..

[8]  Hossein Estiri,et al.  Extracting Electronic Health Record Data in a Practice-Based Research Network: Processes to Support Translational Research across Diverse Practice Organizations , 2016, EGEMS.

[9]  Lucila Ohno-Machado Data science and informatics: when it comes to biomedical data, is there a real distinction? , 2013, J. Am. Medical Informatics Assoc..

[10]  Kari A. Stephens,et al.  Implementing partnership-driven clinical federated electronic health record data sharing networks , 2016, Int. J. Medical Informatics.

[11]  Kari A. Stephens,et al.  LC Data QUEST: A Technical Architecture for Community Federated Clinical Data Sharing , 2012, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[12]  Steven G. Johnson,et al.  A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data , 2016, EGEMS.

[13]  Griffin M. Weber,et al.  Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) , 2010, J. Am. Medical Informatics Assoc..

[14]  Steve Easterbrook,et al.  Open code for open science , 2014 .

[15]  L. Ohno-Machado,et al.  “Big Data” and the Electronic Health Record , 2014, Yearbook of Medical Informatics.

[16]  W. Hersh Adding value to the electronic health record through secondary use of data for quality assurance, research, and surveillance. , 2007, The American journal of managed care.

[17]  George Hripcsak,et al.  Defining and measuring completeness of electronic health records for secondary use , 2013, J. Biomed. Informatics.

[18]  Nicolette de Keizer,et al.  Model Formulation: Defining and Improving Data Quality in Medical Registries: A Literature Review, Case Study, and Generic Framework , 2002, J. Am. Medical Informatics Assoc..

[19]  Kitty S. Chan,et al.  Review: Electronic Health Records and the Reliability and Validity of Quality Measures: A Review of the Literature , 2010, Medical care research and review : MCRR.

[20]  Richard Platt,et al.  Launching PCORnet, a national patient-centered clinical research network , 2014, Journal of the American Medical Informatics Association : JAMIA.

[21]  Aziz Sheikh,et al.  Accuracy and completeness of electronic patient records in primary care. , 2008, Family practice.

[22]  Ping Yu,et al.  A Review of Data Quality Assessment Methods for Public Health Information Systems , 2014, International journal of environmental research and public health.

[23]  M. Kahn,et al.  Data Quality Assessment for Comparative Effectiveness Research in Distributed Data Networks , 2013, Medical care.

[24]  E. McGlynn,et al.  The Challenge of Measuring Quality of Care From the Electronic Health Record , 2009, American journal of medical quality : the official journal of the American College of Medical Quality.

[25]  Lisa M. Schilling,et al.  The DARTNet Institute: Seeking a Sustainable Support Mechanism for Electronic Data Enabled Research Networks , 2014, EGEMS.

[26]  Dario Gregori,et al.  Quality of Electronic Medical Records , 2012 .

[27]  Michelle Dunn,et al.  The National Institutes of Health's Big Data to Knowledge (BD2K) initiative: capitalizing on biomedical big data , 2014, J. Am. Medical Informatics Assoc..

[28]  James M. Walker,et al.  Bridging the inferential gap: the electronic health record and clinical evidence. , 2007, Health affairs.

[29]  Patrick B. Ryan,et al.  Managing Data Quality for a Drug Safety Surveillance System , 2013, Drug Safety.

[30]  T. Murdoch,et al.  The inevitable application of big data to health care. , 2013, JAMA.

[31]  Chunhua Weng,et al.  Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research , 2013, J. Am. Medical Informatics Assoc..

[32]  Yu-Chuan Li,et al.  Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers , 2015, MedInfo.

[33]  Kenneth D. Mandl,et al.  Data interchange using i2b2 , 2016, J. Am. Medical Informatics Assoc..

[34]  Deborah H. Batson,et al.  Data model considerations for clinical effectiveness researchers. , 2012, Medical care.

[35]  Henry C. Chueh,et al.  Calculating the Benefits of a Research Patient Data Repository , 2006, AMIA.

[36]  Isaac S. Kohane,et al.  Architecture of the Open-source Clinical Research Chart from Informatics for Integrating Biology and the Bedside , 2007, AMIA.

[37]  Pradeep Kumar Ray,et al.  Towards an ontology for data quality in integrated chronic disease management: A realist review of the literature , 2013, Int. J. Medical Informatics.

[38]  J. Steiner,et al.  A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research. , 2012, Medical care.

[39]  Alex A. T. Bui,et al.  Envisioning the future of 'big data' biomedicine , 2017, J. Biomed. Informatics.

[40]  Wendy A. Wolf,et al.  The eMERGE Network: A consortium of biorepositories linked to electronic medical records data for conducting genomic studies , 2011, BMC Medical Genomics.