A longitudinal analysis of data quality in a large pediatric data research network

Objective PEDSnet is a clinical data research network (CDRN) that aggregates electronic health record data from multiple children's hospitals to enable large-scale research. Assessing data quality to ensure suitability for conducting research is a key requirement in PEDSnet. This study presents a range of data quality issues identified over a period of 18 months and interprets them to evaluate the research capacity of PEDSnet. Materials and Methods Results were generated by a semiautomated data quality assessment workflow. Two investigators reviewed programmatic data quality issues and conducted discussions with the data partners' extract-transform-load analysts to determine the cause for each issue. Results The results include a longitudinal summary of 2182 data quality issues identified across 9 data submission cycles. The metadata from the most recent cycle includes annotations for 850 issues: most frequent types, including missing data (>300) and outliers (>100); most complex domains, including medications (>160) and lab measurements (>140); and primary causes, including source data characteristics (83%) and extract-transform-load errors (9%). Discussion The longitudinal findings demonstrate the network's evolution from identifying difficulties with aligning the data to a common data model to learning norms in clinical pediatrics and determining research capability. Conclusion While data quality is recognized as a critical aspect in establishing and utilizing a CDRN, the findings from data quality assessments are largely unpublished. This paper presents a real-world account of studying and interpreting data quality findings in a pediatric CDRN, and the lessons learned could be used by other CDRNs.

[1]  Meredith Nahm,et al.  A comprehensive framework for data quality assessment in CER , 2013, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[2]  Lucy Savitz,et al.  Challenges in Using Electronic Health Record Data for CER: Experience of 4 Learning Organizations and Solutions Applied , 2013, Medical care.

[3]  G. Hartvigsen,et al.  Secondary Use of EHR: Data Quality Issues and Informatics Opportunities , 2010, Summit on translational bioinformatics.

[4]  M. Kahn,et al.  Data Quality Assessment for Comparative Effectiveness Research in Distributed Data Networks , 2013, Medical care.

[5]  Ritu Khare,et al.  Establishing Interoperability Standards between OMOP CDM v4, v5, and PCORnet CDM v1 , 2015 .

[6]  J. Steiner,et al.  A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research. , 2012, Medical care.

[7]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[8]  Melissa L. McPheeters,et al.  Methods for systematic reviews of administrative database studies capturing health outcomes of interest. , 2013, Vaccine.

[9]  Chunhua Weng,et al.  Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research , 2013, J. Am. Medical Informatics Assoc..

[10]  Keith Marsolo,et al.  Identifying and Understanding Data Quality Issues in a Pediatric Distributed Research Network , 2015, AMIA.

[11]  Michael Seid,et al.  PEDSnet: how a prototype pediatric learning health system is being expanded into a national network. , 2014, Health affairs.

[12]  Keith Marsolo,et al.  PEDSnet: a National Pediatric Learning Health System , 2014, J. Am. Medical Informatics Assoc..

[13]  Laurian M. Chirica,et al.  The entity-relationship model: toward a unified view of data , 1975, SIGF.

[14]  Ritu Khare,et al.  Understanding the EMR error control practices among gynecologic physicians , 2013 .

[15]  Zhiyong Lu,et al.  LabeledIn: Cataloging labeled indications for human drugs , 2014, J. Biomed. Informatics.

[16]  Nicolette de Keizer,et al.  Model Formulation: Defining and Improving Data Quality in Medical Registries: A Literature Review, Case Study, and Generic Framework , 2002, J. Am. Medical Informatics Assoc..

[17]  Patrick B. Ryan,et al.  Multisite Evaluation of a Data Quality Tool for Patient-Level Clinical Data Sets , 2016, EGEMS.

[18]  Ritu Khare,et al.  PEDSnet: from building a high-quality CDRN to conducting science , 2016, AMIA.

[19]  Christopher B. Forrest,et al.  Multi-Institutional Sharing of Electronic Health Record Data to Assess Childhood Obesity , 2013, PloS one.

[20]  Francis S. Collins,et al.  PCORnet: turning a dream into reality , 2014, J. Am. Medical Informatics Assoc..

[21]  Steven G. Johnson,et al.  A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data , 2016, EGEMS.

[22]  George Hripcsak,et al.  Caveats for the use of operational electronic health record data in comparative effectiveness research. , 2013, Medical care.

[23]  Patrick B. Ryan,et al.  Transparent Reporting of Data Quality in Distributed Data Networks , 2015, EGEMS.

[24]  Levon Utidjian,et al.  Understanding the gaps between Data Quality Checks and Research Capabilities in a Pediatric Data Research Network , 2017, CRI.

[25]  Amardeep Thind,et al.  Using your electronic medical record for research: a primer for avoiding pitfalls. , 2010, Family practice.

[26]  Peter P. Chen The entity-relationship model: toward a unified view of data , 1975, VLDB '75.

[27]  S L George,et al.  Guidelines for quality assurance in multicenter trials: a position paper. , 1998, Controlled clinical trials.