Design and Refinement of a Data Quality Assessment Workflow for a Large Pediatric Research Network

Background: Clinical data research networks (CDRNs) aggregate electronic health record data from multiple hospitals to enable large-scale research. A critical operation toward building a CDRN is conducting continual evaluations to optimize data quality. The key challenges include determining the assessment coverage on big datasets, handling data variability over time, and facilitating communication with data teams. This study presents the evolution of a systematic workflow for data quality assessment in CDRNs. Implementation: Using a specific CDRN as use case, the workflow was iteratively developed and packaged into a toolkit. The resultant toolkit comprises 685 data quality checks to identify any data quality issues, procedures to reconciliate with a history of known issues, and a contemporary GitHub-based reporting mechanism for organized tracking. Results: During the first two years of network development, the toolkit assisted in discovering over 800 data characteristics and resolving over 1400 programming errors. Longitudinal analysis indicated that the variability in time to resolution (15day mean, 24day IQR) is due to the underlying cause of the issue, perceived importance of the domain, and the complexity of assessment. Conclusions: In the absence of a formalized data quality framework, CDRNs continue to face challenges in data management and query fulfillment. The proposed data quality toolkit was empirically validated on a particular network, and is publicly available for other networks. While the toolkit is user-friendly and effective, the usage statistics indicated that the data quality process is very time-intensive and sufficient resources should be dedicated for investigating problems and optimizing data for research.

[1]  Kevin Haynes,et al.  Electronic clinical laboratory test results data tables: lessons from Mini‐Sentinel , 2014, Pharmacoepidemiology and drug safety.

[2]  Levon Utidjian,et al.  Understanding the gaps between Data Quality Checks and Research Capabilities in a Pediatric Data Research Network , 2017, CRI.

[3]  Keith Marsolo,et al.  Evaluating Foundational Data Quality in the National Patient-Centered Clinical Research Network (PCORnet®) , 2018, EGEMS.

[4]  Nicolette de Keizer,et al.  Model Formulation: Defining and Improving Data Quality in Medical Registries: A Literature Review, Case Study, and Generic Framework , 2002, J. Am. Medical Informatics Assoc..

[5]  Ritu Khare,et al.  PEDSnet: from building a high-quality CDRN to conducting science , 2016, AMIA.

[6]  Christopher B. Forrest,et al.  Multi-Institutional Sharing of Electronic Health Record Data to Assess Childhood Obesity , 2013, PloS one.

[7]  Marsha A Raebel,et al.  Design considerations, architecture, and use of the Mini‐Sentinel distributed data system , 2012, Pharmacoepidemiology and drug safety.

[8]  Meredith Nahm,et al.  A comprehensive framework for data quality assessment in CER , 2013, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[9]  Patrick B. Ryan,et al.  A Comparison of Data Quality Assessment Checks in Six Data Sharing Networks , 2017, EGEMS.

[10]  Keith Marsolo,et al.  PEDSnet: a National Pediatric Learning Health System , 2014, J. Am. Medical Informatics Assoc..

[11]  Steven G. Johnson,et al.  A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data , 2016, EGEMS.

[12]  Keith Marsolo,et al.  A longitudinal analysis of data quality in a large pediatric data research network , 2017, J. Am. Medical Informatics Assoc..

[13]  Chunhua Weng,et al.  Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research , 2013, J. Am. Medical Informatics Assoc..

[14]  Hanieh Razzaghi,et al.  Predicting Causes of Data Quality Issues in a Clinical Data Research Network , 2018, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[15]  J. Steiner,et al.  A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research. , 2012, Medical care.

[16]  M. Kahn,et al.  Data Quality Assessment for Comparative Effectiveness Research in Distributed Data Networks , 2013, Medical care.