Evaluation of data completeness in the electronic health record for the purpose of patient recruitment into clinical trials: a retrospective analysis of element presence

BackgroundComputerized clinical trial recruitment support is one promising field for the application of routine care data for clinical research. The primary task here is to compare the eligibility criteria defined in trial protocols with patient data contained in the electronic health record (EHR). To avoid the implementation of different patient definitions in multi-site trials, all participating research sites should use similar patient data from the EHR. Knowledge of the EHR data elements which are commonly available from most EHRs is required to be able to define a common set of criteria. The objective of this research is to determine for five tertiary care providers the extent of available data compared with the eligibility criteria of randomly selected clinical trials.MethodsEach participating study site selected three clinical trials at random. All eligibility criteria sentences were broken up into independent patient characteristics, which were then assigned to one of the 27 semantic categories for eligibility criteria developed by Luo et al. We report on the fraction of patient characteristics with corresponding structured data elements in the EHR and on the fraction of patients with available data for these elements. The completeness of EHR data for the purpose of patient recruitment is calculated for each semantic group.Results351 eligibility criteria from 15 clinical trials contained 706 patient characteristics. In average, 55% of these characteristics could be documented in the EHR. Clinical data was available for 64% of all patients, if corresponding data elements were available. The total completeness of EHR data for recruitment purposes is 35%. The best performing semantic groups were ‘age’ (89%), ‘gender’ (89%), ‘addictive behaviour’ (74%), ‘disease, symptom and sign’ (64%) and ‘organ or tissue status’ (61%). No data was available for 6 semantic groups.ConclusionsThere exists a significant gap in structure and content between data documented during patient care and data required for patient eligibility assessment. Nevertheless, EHR data on age and gender of the patient, as well as selected information on his disease can be complete enough to allow for an effective support of the manual screening process with an intelligent preselection of patients and patient data.

[1]  C. McDonald,et al.  Logical observation identifier names and codes (LOINC) database: a public use set of codes and names for electronic reporting of clinical laboratory test results. , 1996, Clinical chemistry.

[2]  Kitty S. Chan,et al.  Review: Electronic Health Records and the Reliability and Validity of Quality Measures: A Review of the Literature , 2010, Medical care research and review : MCRR.

[3]  K. Thiru,et al.  Systematic review of scope and quality of electronic patient record data in primary care , 2003, BMJ : British Medical Journal.

[4]  Chunhua Weng,et al.  Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research , 2013, J. Am. Medical Informatics Assoc..

[5]  A. Walker,et al.  A systematic review of discharge coding accuracy. , 2001, Journal of public health medicine.

[6]  J van der Lei,et al.  Use and Abuse of Computer-Stored Medical Records , 1991, Methods of Information in Medicine.

[7]  Matthias Egger,et al.  Electronic medical record systems, data quality and loss to follow-up: survey of antiretroviral therapy programmes in resource-limited settings. , 2008, Bulletin of the World Health Organization.

[8]  H. Prokosch,et al.  Perspectives for Medical Informatics , 2009, Methods of Information in Medicine.

[9]  Charles Safran,et al.  Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[10]  S Scobie,et al.  Can general practice data be used for needs assessment and health care planning in an inner-London district? , 1995, Journal of public health medicine.

[11]  Lawrence Hunter,et al.  Desiderata for ontologies to be used in semantic annotation of biomedical documents , 2011, J. Biomed. Informatics.

[12]  John F. Hurdle,et al.  Measuring diagnoses: ICD code accuracy. , 2005, Health services research.

[13]  G. Hartvigsen,et al.  Secondary Use of EHR: Data Quality Issues and Informatics Opportunities , 2010, Summit on translational bioinformatics.

[14]  P. Ziprin,et al.  Systematic review of discharge coding accuracy. , 2012, Journal of public health.

[15]  Christel Daniel-Le Bozec,et al.  Integrating clinical research with the Healthcare Enterprise: From the RE-USE project to the EHR4CR platform , 2011, J. Biomed. Informatics.

[16]  Chunhua Weng,et al.  Formal representation of eligibility criteria: A literature review , 2010, J. Biomed. Informatics.

[17]  M Pringle,et al.  Assessment of the completeness and accuracy of computer medical records in four practices committed to recording data on computer. , 1995, The British journal of general practice : the journal of the Royal College of General Practitioners.

[18]  Martin Dugas,et al.  Workflow to improve patient recruitment for clinical trials within hospital information systems – a case-study , 2008, Trials.

[19]  Edith Schonberg,et al.  Matching Patient Records to Clinical Trials Using Ontologies , 2007, ISWC/ASWC.

[20]  David Glasspool,et al.  Comparing semi-automatic systems for recruitment of patients to clinical trials , 2011, Int. J. Medical Informatics.

[21]  David W. Baker,et al.  Electronic Health Record-Based Cardiac Risk Assessment and Identification of Unmet Preventive Needs , 2009, Medical care.

[22]  Joseph Erdos,et al.  Comparison of Two VA Laboratory Data Repositories Indicates That Missing Data Vary Despite Originating From the Same Source , 2009, Medical care.

[23]  Chunhua Weng,et al.  Dynamic categorization of clinical research eligibility criteria by hierarchical clustering , 2011, J. Biomed. Informatics.

[24]  Aziz A. Boxwala,et al.  Decision support for clinical trial eligibility determination in breast cancer , 1999, AMIA.

[25]  S. Tu,et al.  Analysis of Eligibility Criteria Complexity in Clinical Trials , 2010, Summit on translational bioinformatics.

[26]  David W. Embley,et al.  Formulating Queries for Assessing Clinical Trial Eligibility , 2006, NLDB.