Application of openEHR archetypes to automate data quality rules for electronic health records: a case study

Background Ensuring data is of appropriate quality is essential for the secondary use of electronic health records (EHRs) in research and clinical decision support. An effective method of data quality assessment (DQA) is automating data quality rules (DQRs) to replace the time-consuming, labor-intensive manual process of creating DQRs, which is difficult to guarantee standard and comparable DQA results. This paper presents a case study of automatically creating DQRs based on openEHR archetypes in a Chinese hospital to investigate the feasibility and challenges of automating DQA for EHR data. Methods The clinical data repository (CDR) of the Shanxi Dayi Hospital is an archetype-based relational database. Four steps are undertaken to automatically create DQRs in this CDR database. First, the keywords and features relevant to DQA of archetypes were identified via mapping them to a well-established DQA framework, Kahn’s DQA framework. Second, the templates of DQRs in correspondence with these identified keywords and features were created in the structured query language (SQL). Third, the quality constraints were retrieved from archetypes. Fourth, these quality constraints were automatically converted to DQRs according to the pre-designed templates and mapping relationships of archetypes and data tables. We utilized the archetypes of the CDR to automatically create DQRs to meet quality requirements of the Chinese Application-Level Ranking Standard for EHR Systems (CARSES) and evaluated their coverage by comparing with expert-created DQRs. Results We used 27 archetypes to automatically create 359 DQRs. 319 of them are in agreement with the expert-created DQRs, covering 84.97% (311/366) requirements of the CARSES. The auto-created DQRs had varying levels of coverage of the four quality domains mandated by the CARSES: 100% (45/45) of consistency, 98.11% (208/212) of completeness, 54.02% (57/87) of conformity, and 50% (11/22) of timeliness. Conclusion It’s feasible to create DQRs automatically based on openEHR archetypes. This study evaluated the coverage of the auto-created DQRs to a typical DQA task of Chinese hospitals, the CARSES. The challenges of automating DQR creation were identified, such as quality requirements based on semantic, and complex constraints of multiple elements. This research can enlighten the exploration of DQR auto-creation and contribute to the automatic DQA.

[1]  Kristine E Lynch,et al.  Incrementally Transforming Electronic Medical Records into the Observational Medical Outcomes Partnership Common Data Model: A Multidimensional Quality Assurance Approach , 2019, Applied Clinical Informatics.

[2]  S. Bakken,et al.  A Data Quality Assessment Guideline for Electronic Health Record Data Reuse , 2017, EGEMS.

[3]  Steven G. Johnson,et al.  A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data , 2016, EGEMS.

[4]  Huilong Duan,et al.  Archetype relational mapping - a practical openEHR persistence solution , 2015, BMC Medical Informatics and Decision Making.

[5]  R M Milne,et al.  Completeness and accuracy of morbidity and repeat prescribing records held on general practice computers in Scotland. , 1996, The British journal of general practice : the journal of the Royal College of General Practitioners.

[6]  J. Welton,et al.  Using a Data Quality Framework to Clean Data Extracted from the Electronic Health Record: A Case Study , 2016, EGEMS.

[7]  Shawn N Murphy,et al.  Semi-supervised Encoding for Outlier Detection in Clinical Observation Data , 2018, bioRxiv.

[8]  George Hripcsak,et al.  Defining and measuring completeness of electronic health records for secondary use , 2013, J. Biomed. Informatics.

[9]  Josh Juneau Expression Language (EL) , 2013 .

[10]  Stuart Speedie,et al.  Application of an Ontology for Characterizing Data Quality for a Secondary Use of EHR Data , 2016, Applied Clinical Informatics.

[11]  Sebastian Garde,et al.  Towards a comprehensive electronic patient record to support an innovative individual care concept for premature infants using the openEHR approach , 2009, Int. J. Medical Informatics.

[12]  Richard Y. Wang,et al.  Data quality assessment , 2002, Commun. ACM.

[13]  Philip J. B. Brown,et al.  Data quality probes - exploiting and improving the quality of electronic patient record data and patient care , 2002, Int. J. Medical Informatics.

[14]  Serhan Dagtas,et al.  Rule-Based Data Quality Assessment and Monitoring System in Healthcare Facilities , 2019, ITCH.

[15]  Huilong Duan,et al.  An openEHR based approach to improve the semantic interoperability of clinical data registry , 2018, BMC Medical Informatics and Decision Making.

[16]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[17]  Mirta Baranovic,et al.  Generating data quality rules and integration into ETL process , 2009, DOLAP.

[18]  G. Hartvigsen,et al.  Secondary Use of EHR: Data Quality Issues and Informatics Opportunities , 2010, Summit on translational bioinformatics.

[19]  W. Hersh Adding value to the electronic health record through secondary use of data for quality assurance, research, and surveillance. , 2007, The American journal of managed care.

[20]  Steven G. Johnson,et al.  A Data Quality Ontology for the Secondary Use of EHR Data , 2015, AMIA.

[21]  Fleur Fritz,et al.  Electronic health records to facilitate clinical research , 2016, Clinical Research in Cardiology.

[22]  Patrick B. Ryan,et al.  Multisite Evaluation of a Data Quality Tool for Patient-Level Clinical Data Sets , 2016, EGEMS.

[23]  Huilong Duan,et al.  Modeling EHR with the openEHR approach: an exploratory study in China , 2018, BMC Medical Informatics and Decision Making.

[24]  Mahmoud Boufaïda,et al.  A Rule Management System for Knowledge Based Data Cleaning , 2011, Intell. Inf. Manag..

[25]  Dipak Kalra,et al.  Evaluation of clinical information modeling tools , 2016, J. Am. Medical Informatics Assoc..

[26]  Ping Yu,et al.  A Review of Data Quality Assessment Methods for Public Health Information Systems , 2014, International journal of environmental research and public health.

[27]  Der-Ming Liou,et al.  An exploratory study using an openEHR 2-level modeling approach to represent common data elements , 2016, J. Am. Medical Informatics Assoc..

[28]  Keith Marsolo,et al.  A longitudinal analysis of data quality in a large pediatric data research network , 2017, J. Am. Medical Informatics Assoc..

[29]  D. Radke,et al.  Square2 - A Web Application for Data Monitoring in Epidemiological and Clinical Studies , 2017, Studies in health technology and informatics.

[30]  D Kalra,et al.  Electronic health records: new opportunities for clinical research , 2013, Journal of internal medicine.

[31]  Chunhua Weng,et al.  Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research , 2013, J. Am. Medical Informatics Assoc..