Natural Language Processing for Mimicking Clinical Trial Recruitment in Critical Care: A Semi-Automated Simulation Based on the LeoPARDS Trial

Abstract Clinical trials often fail on recruiting an adequate number of appropriate patients. Identifying eligible trial participants is a resource-intensive task when relying on manual review of clinical notes, particularly in critical care settings where the time window is short. Automated review of electronic health records has been explored as a way of identifying trial participants, but much of the information is in unstructured free text rather than a computable form. We developed an electronic health record pipeline that combines structured electronic health record data with free text in order to simulate recruitment into the LeoPARDS trial. We applied an algorithm to identify eligible patients using a moving 1-hour time window, and compared the set of patients identified by our approach with those actually screened and recruited for the trial. We manually reviewed clinical records for a random sample of additional patients identified by the algorithm but not identified for screening in the original trial. Our approach identified 308 patients, of whom 208 were screened in the actual trial. We identified all 40 patients with CCHIC data available who were actually recruited to LeoPARDS in our centre. The algorithm identified 96 patients on the same day as manual screening and 62 patients one or two days earlier. Analysis of electronic health records incorporating natural language processing tools could effectively replicate recruitment in a critical care trial, and identify some eligible patients at an earlier stage. If implemented in real-time this could improve the efficiency of clinical trial recruitment.

[1]  Siddhartha R. Jonnalagadda,et al.  Text Mining of the Electronic Health Record: An Information Extraction Approach for Automated Identification and Subphenotyping of HFpEF Patients for Clinical Trials , 2017, Journal of Cardiovascular Translational Research.

[2]  Spiros Denaxas,et al.  Critical Care Health Informatics Collaborative (CCHIC): Data, tools and methods for reproducible research: A multi-centre UK intensive care database , 2018, Int. J. Medical Informatics.

[3]  James Pustejovsky,et al.  ISO-TimeML: An International Standard for Semantic Annotation , 2010, LREC.

[4]  Leon Derczynski,et al.  Normalisation of imprecise temporal expressions extracted from text , 2019, Knowledge and Information Systems.

[5]  Diederick E Grobbee,et al.  Uniform data collection in routine clinical practice in cardiovascular patients for optimal care, quality control and research: The Utrecht Cardiovascular Cohort , 2017, European journal of preventive cardiology.

[6]  Abhishek Pandey,et al.  Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review , 2017, J. Biomed. Informatics.

[7]  Jun Ma,et al.  Validation of clinic weights from electronic health records against standardized weight measurements in weight loss trials , 2017, Obesity.

[8]  Ralph B D'Agostino,et al.  The randomized registry trial--the next disruptive technology in clinical research? , 2013, The New England journal of medicine.

[9]  Mitchell M. Levy,et al.  2001 SCCM/ESICM/ACCP/ATS/SIS International Sepsis Definitions Conference , 2003, Intensive Care Medicine.

[10]  Angus Roberts,et al.  Bio-YODIE: A Named Entity Linking System for Biomedical Text , 2018, ArXiv.

[11]  Aisling O'Leary,et al.  Do disparities between populations in randomized controlled trials and the real world lead to differences in outcomes? , 2017, Journal of comparative effectiveness research.

[12]  Ellen McDonald,et al.  Research recruitment practices and critically ill patients. A multicenter, cross-sectional study (the Consent Study). , 2013, American journal of respiratory and critical care medicine.

[13]  Gavin D Perkins,et al.  Levosimendan for the Prevention of Acute Organ Dysfunction in Sepsis. , 2016, The New England journal of medicine.

[14]  Naomi Sager,et al.  Research Paper: Natural Language Processing and the Representation of Clinical Data , 1994, J. Am. Medical Informatics Assoc..

[15]  Nishkantha Arulkumaran,et al.  Exploring obstacles to critical care trials in the UK: A qualitative investigation , 2017, Journal of the Intensive Care Society.

[16]  Zina M. Ibrahim,et al.  SemEHR: A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research , 2017, bioRxiv.

[17]  Kalina Bontcheva,et al.  Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics , 2013, PLoS Comput. Biol..

[18]  Wendy W. Chapman,et al.  2182: Developing a corpus for natural language processing to identify bleeding complications among intensive care unit patients , 2017 .

[19]  Dezon Finch,et al.  Classifying clinical notes with pain assessment using machine learning , 2017, Medical & Biological Engineering & Computing.

[20]  Louise Deléger,et al.  Increasing the efficiency of trial-patient matching: automated clinical trial eligibility Pre-screening for pediatric oncology patients , 2015, BMC Medical Informatics and Decision Making.

[21]  M. Gulliford,et al.  Cluster randomized trials utilizing primary care electronic health records: methodological issues in design, conduct, and analysis (eCRT Study) , 2014, Trials.

[22]  J. Johnston,et al.  A literature review on the representativeness of randomized controlled trial samples and implications for the external validity of trial results , 2015, Trials.

[23]  Daniel Hind,et al.  Recruitment and retention of participants in randomised controlled trials: a review of trials funded and published by the United Kingdom Health Technology Assessment Programme , 2017, BMJ Open.

[24]  Joy Adamson,et al.  The opportunities and challenges of pragmatic point-of-care randomised trials using routinely collected electronic records: evaluations of two exemplar trials. , 2014, Health technology assessment.

[25]  Kevin B. Johnson,et al.  Observational Cohort Studies and the Challenges of In Silico Experiments. , 2017, JAMA oncology.

[26]  Jane M Blazeby,et al.  Detailed systematic analysis of recruitment strategies in randomised controlled trials in patients with an unscheduled admission to hospital , 2018, BMJ Open.

[27]  P. Donnan,et al.  The PRECIS-2 tool: designing trials that are fit for purpose , 2015, BMJ : British Medical Journal.

[28]  Christopher B. Granger,et al.  Registry-based randomized clinical trials—a new clinical trial paradigm , 2015, Nature Reviews Cardiology.

[29]  Tudor Groza,et al.  CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital , 2017, bioRxiv.

[30]  P. Rothwell,et al.  Commentary: External validity of results of randomized trials: disentangling a complex concept. , 2010, International journal of epidemiology.

[31]  Katie Brittain,et al.  The effectiveness of collaborative care for people with memory problems in primary care: results of the CAREDEM case management modelling and feasibility study. , 2014, Health technology assessment.