Multisite Evaluation of a Data Quality Tool for Patient-Level Clinical Data Sets

Introduction: Data quality and fitness for analysis are crucial if outputs of analyses of electronic health record data or administrative claims data should be trusted by the public and the research community. Methods: We describe a data quality analysis tool (called Achilles Heel) developed by the Observational Health Data Sciences and Informatics Collaborative (OHDSI) and compare outputs from this tool as it was applied to 24 large healthcare datasets across seven different organizations. Results: We highlight 12 data quality rules that identified issues in at least 10 of the 24 datasets and provide a full set of 71 rules identified in at least one dataset. Achilles Heel is a freely available software that provides a useful starter set of data quality rules with the ability to add additional rules. We also present results of a structured email-based interview of all participating sites that collected qualitative comments about the value of Achilles Heel for data quality evaluation. Discussion: Our analysis represents the first comparison of outputs from a data quality tool that implements a fixed (but extensible) set of data quality rules. Thanks to a common data model, we were able to compare quickly multiple datasets originating from several countries in America, Europe and Asia.

[1]  S. Papson “Model” , 1981 .

[2]  Steve Evans,et al.  The DEDUCE Guided Query tool: Providing simplified access to clinical data for research and quality improvement , 2011, J. Biomed. Informatics.

[3]  S. de Lusignan,et al.  A system of metadata to control the process of query, aggregating, cleaning and analysing large datasets of primary care data. , 2005, Informatics in primary care.

[4]  P. Ryan,et al.  Transforming the Premier Perspective® Hospital Database into the Observational Medical Outcomes Partnership (OMOP) Common Data Model , 2014, EGEMS.

[5]  Nicolette de Keizer,et al.  Model Formulation: Defining and Improving Data Quality in Medical Registries: A Literature Review, Case Study, and Generic Framework , 2002, J. Am. Medical Informatics Assoc..

[6]  Bruce M Psaty,et al.  Mini-Sentinel and regulatory science--big data rendered fit and functional. , 2014, The New England journal of medicine.

[7]  T. Williamson,et al.  From patient care to research: a validation study examining the factors contributing to data quality in a primary care electronic medical record database , 2015, BMC Family Practice.

[8]  J. Steiner,et al.  A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research. , 2012, Medical care.

[9]  Patrick B. Ryan,et al.  Transparent Reporting of Data Quality in Distributed Data Networks , 2015, EGEMS.

[10]  Andrew Bate,et al.  A Comparative Assessment of Observational Medical Outcomes Partnership and Mini-Sentinel Common Data Models and Analytics: Implications for Active Drug Safety Surveillance , 2015, Drug Safety.

[11]  M. Kahn,et al.  Data Quality Assessment for Comparative Effectiveness Research in Distributed Data Networks , 2013, Medical care.

[12]  Susan C. Weber,et al.  STRIDE - An Integrated Standards-Based Translational Research Informatics Platform , 2009, AMIA.

[13]  Andrew Bate,et al.  An Evaluation of the THIN Database in the OMOP Common Data Model for Active Drug Safety Surveillance , 2013, Drug Safety.

[14]  S de Lusignan,et al.  Key Concepts to Assess the Readiness of Data for International Research: Data Quality, Lineage and Provenance, Extraction and Processing Errors, Traceability, and Curation , 2011, Yearbook of Medical Informatics.

[15]  Roy Pardee,et al.  The HMO Research Network Virtual Data Warehouse: A Public Data Model to Support Collaboration , 2014, EGEMS.

[16]  Nicolette de Keizer,et al.  Influence of data quality on computed Dutch hospital quality indicators: a case study in colorectal cancer surgery , 2014, BMC Medical Informatics and Decision Making.

[17]  Yu-Chuan Li,et al.  Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers , 2015, MedInfo.

[18]  William W. Stead,et al.  Toward a science of learning systems: a research agenda for the high-functioning Learning Health System , 2014, J. Am. Medical Informatics Assoc..

[19]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[20]  C. Chute,et al.  Electronic Medical Records for Genetic Research: Results of the eMERGE Consortium , 2011, Science Translational Medicine.